Machine Learning Infrastructure Engineer

United States•North America

Remote

Senior

Full Time

2 months ago

Machine LearningInfrastructureGPUCloudDistributed SystemsKubernetesAIBlockchainCryptocurrencySenior Engineer

Requirements

•Bachelor’s degree or equivalent in Computer Science or related field.
•5+ years of experience building and operating distributed systems or infrastructure in production environments.
•Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP).
•Deep understanding of high-throughput inference systems including batching strategies, token throughput optimization, and trade-offs between latency, throughput, and cost.
•Experience with ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum.
•Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems.
•Familiarity with distributed inference strategies including model parallelism and tensor parallelism.
•Experience working with Kubernetes or equivalent orchestration systems in cloud environments.
•Excellent communication skills for technical and non-technical audiences.
•Ability to work autonomously and collaboratively in cross-functional teams.

•Design and operate GPU cluster infrastructure including orchestration, autoscaling, resource isolation, and workload management.
•Optimize high-throughput inference systems for token throughput, batching efficiency, GPU occupancy, and cost effectiveness.
•Enable distributed inference strategies such as model parallelism and tensor parallelism.
•Implement model optimization and compilation workflows integrating acceleration stacks like TensorRT, ONNX Runtime, vLLM, FlashAttention.
•Schedule heterogeneous workloads managing multiple models, users, and mixed workload types across heterogeneous accelerators.
•Build observability into ML infrastructure to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput.
•Collaborate with infrastructure, ML, and product teams to transition models from experimentation to production-grade services.

•Opportunity to work on impactful projects that build a safer financial system worldwide.
•High-velocity, high-ownership team culture with fast shipping and experimentation.
•AI fluency is a baseline expectation and part of the work culture.
•Distributed-first company with hubs in multiple major cities.