San Jose, California, United States•San Jose, United States (US)
Remote
Senior
Full Time
about 1 month ago
💰$313,055 - $450,000
AIMachine LearningLarge Language ModelsPost-TrainingReinforcement LearningModel Alignment
Requirements
- •Bachelor's in Computer Science, AI, Machine Learning, or related fields with at least 8 years of industry experience.
- •Strong hands-on experience across the full post-training pipeline for large models.
- •Deep familiarity with preference learning and alignment techniques, including DPO, GRPO, and RL-based post-training methodologies.
- •Proven experience designing domain-specific data strategies and training methodologies.
- •Experience training and post-training specialized small models from scratch.
- •Solid understanding of reinforcement learning fundamentals and their application to model alignment.
- •Experience deploying models in low-latency production environments using frameworks such as vLLM, SGLang, or similar.
What You'll Do
- •Lead and execute the full post-training pipeline for large language models (LLMs), including supervised fine-tuning, preference optimization, and reinforcement learning–based methods.
- •Design and implement advanced training paradigms such as DPO (Direct Preference Optimization) and GRPO (Generalized Reward Policy Optimization).
- •Develop domain-specific data recipes, curation strategies, and augmentation pipelines to optimize task performance.
- •Conduct post-training of specialized small models from scratch, including architecture selection, dataset construction, and optimization strategy.
- •Build and refine Reward Models to support alignment and downstream optimization.
- •Design and implement RLAIF (Reinforcement Learning from AI Feedback) closed-loop systems.
- •Optimize inference efficiency and deploy models using low-latency serving frameworks such as vLLM and SGLang.
- •Evaluate model performance using both automated benchmarks and human/AI feedback loops.
- •Collaborate with research and infrastructure teams to productionize training and deployment workflows.
Benefits
- •Competitive total compensation package
- •L&D programs and Education subsidy for employees' growth and development
- •Various team building programs and company events
- •Wellness and meal allowances
- •Comprehensive healthcare schemes for employees and dependants
- •More that we love to tell you along the process!
