TRM Labs logo
    T

    Machine Learning Infrastructure Engineer

    TRM Labs
    United StatesNorth America
    Remote
    Senior
    Full Time
    15 days ago
    Machine LearningInfrastructureGPUCloudDistributed SystemsKubernetesAIBlockchainCryptocurrencySenior Engineer

    Requirements

    • Bachelor’s degree or equivalent in Computer Science or related field.
    • 5+ years of experience building and operating distributed systems or infrastructure in production environments.
    • Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP).
    • Deep understanding of high-throughput inference systems including batching strategies, token throughput optimization, and trade-offs between latency, throughput, and cost.
    • Experience with ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum.
    • Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems.
    • Familiarity with distributed inference strategies including model parallelism and tensor parallelism.
    • Experience working with Kubernetes or equivalent orchestration systems in cloud environments.
    • Excellent communication skills for technical and non-technical audiences.
    • Ability to work autonomously and collaboratively in cross-functional teams.

    What You'll Do

    • Design and operate GPU cluster infrastructure including orchestration, autoscaling, resource isolation, and workload management.
    • Optimize high-throughput inference systems for token throughput, batching efficiency, GPU occupancy, and cost effectiveness.
    • Enable distributed inference strategies such as model parallelism and tensor parallelism.
    • Implement model optimization and compilation workflows integrating acceleration stacks like TensorRT, ONNX Runtime, vLLM, FlashAttention.
    • Schedule heterogeneous workloads managing multiple models, users, and mixed workload types across heterogeneous accelerators.
    • Build observability into ML infrastructure to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput.
    • Collaborate with infrastructure, ML, and product teams to transition models from experimentation to production-grade services.

    Nice to Have

    • Familiarity with heterogeneous accelerators such as Inferentia.
    • CUDA familiarity and experience debugging GPU-related issues.
    • Adaptability to changing goals and fast-paced environment.

    Benefits

    • Opportunity to work on impactful projects that build a safer financial system worldwide.
    • High-velocity, high-ownership team culture with fast shipping and experimentation.
    • AI fluency is a baseline expectation and part of the work culture.
    • Distributed-first company with hubs in multiple major cities.

    About TRM Labs

    TRM Labs is a software company that offers blockchain, transaction monitoring, and analytics to help financial institutions and governments.

    San Francisco, CA, US
    100 - 250
    Blockchain & Cryptocurrency