OKX logo
    O

    Staff AI Engineer, Model Post-Training and Alignment

    OKX
    San Jose, California, United StatesSan Jose, United States (US)
    Remote
    Senior
    Full Time
    about 1 month ago
    💰$313,055 - $450,000
    AIMachine LearningLarge Language ModelsPost-TrainingReinforcement LearningModel Alignment

    Requirements

    • Bachelor's in Computer Science, AI, Machine Learning, or related fields with at least 8 years of industry experience.
    • Strong hands-on experience across the full post-training pipeline for large models.
    • Deep familiarity with preference learning and alignment techniques, including DPO, GRPO, and RL-based post-training methodologies.
    • Proven experience designing domain-specific data strategies and training methodologies.
    • Experience training and post-training specialized small models from scratch.
    • Solid understanding of reinforcement learning fundamentals and their application to model alignment.
    • Experience deploying models in low-latency production environments using frameworks such as vLLM, SGLang, or similar.

    What You'll Do

    • Lead and execute the full post-training pipeline for large language models (LLMs), including supervised fine-tuning, preference optimization, and reinforcement learning–based methods.
    • Design and implement advanced training paradigms such as DPO (Direct Preference Optimization) and GRPO (Generalized Reward Policy Optimization).
    • Develop domain-specific data recipes, curation strategies, and augmentation pipelines to optimize task performance.
    • Conduct post-training of specialized small models from scratch, including architecture selection, dataset construction, and optimization strategy.
    • Build and refine Reward Models to support alignment and downstream optimization.
    • Design and implement RLAIF (Reinforcement Learning from AI Feedback) closed-loop systems.
    • Optimize inference efficiency and deploy models using low-latency serving frameworks such as vLLM and SGLang.
    • Evaluate model performance using both automated benchmarks and human/AI feedback loops.
    • Collaborate with research and infrastructure teams to productionize training and deployment workflows.

    Benefits

    • Competitive total compensation package
    • L&D programs and Education subsidy for employees' growth and development
    • Various team building programs and company events
    • Wellness and meal allowances
    • Comprehensive healthcare schemes for employees and dependants
    • More that we love to tell you along the process!

    About OKX

    OKX is the second largest global crypto exchange by trading volume and a leading Web3 ecosystem.

    Victoria, Seychelles
    1000 - 5000
    Blockchain & Cryptocurrency