United States•North America
Remote
Senior
Full Time
15 days ago
Machine LearningInfrastructureGPUCloudDistributed SystemsKubernetesAIBlockchainCryptocurrencySenior Engineer
Requirements
- •Bachelor’s degree or equivalent in Computer Science or related field.
- •5+ years of experience building and operating distributed systems or infrastructure in production environments.
- •Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP).
- •Deep understanding of high-throughput inference systems including batching strategies, token throughput optimization, and trade-offs between latency, throughput, and cost.
- •Experience with ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum.
- •Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems.
- •Familiarity with distributed inference strategies including model parallelism and tensor parallelism.
- •Experience working with Kubernetes or equivalent orchestration systems in cloud environments.
- •Excellent communication skills for technical and non-technical audiences.
- •Ability to work autonomously and collaboratively in cross-functional teams.
What You'll Do
- •Design and operate GPU cluster infrastructure including orchestration, autoscaling, resource isolation, and workload management.
- •Optimize high-throughput inference systems for token throughput, batching efficiency, GPU occupancy, and cost effectiveness.
- •Enable distributed inference strategies such as model parallelism and tensor parallelism.
- •Implement model optimization and compilation workflows integrating acceleration stacks like TensorRT, ONNX Runtime, vLLM, FlashAttention.
- •Schedule heterogeneous workloads managing multiple models, users, and mixed workload types across heterogeneous accelerators.
- •Build observability into ML infrastructure to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput.
- •Collaborate with infrastructure, ML, and product teams to transition models from experimentation to production-grade services.
Nice to Have
- •Familiarity with heterogeneous accelerators such as Inferentia.
- •CUDA familiarity and experience debugging GPU-related issues.
- •Adaptability to changing goals and fast-paced environment.
Benefits
- •Opportunity to work on impactful projects that build a safer financial system worldwide.
- •High-velocity, high-ownership team culture with fast shipping and experimentation.
- •AI fluency is a baseline expectation and part of the work culture.
- •Distributed-first company with hubs in multiple major cities.
