MLE (Reinforcement Learning Training Infrastructure) - NOUS RESEARCH Skip to content ","library":"fa-solid"},"layout":"horizontal","toggle":"burger"}" data-widget_type="nav-menu.default"> Home Hermes Agent Nous Portal Psyche Hermes 4 Releases Careers Shop Blog Menu Home Hermes Agent Nous Portal Psyche Hermes 4 Chat Releases Careers Shop Blog MLE (Reinforcement Learning Training Infrastructure) We’re looking for an MLE to build and scale distributed reinforcement learning systems for model training. You’ll deploy elastic environment microservices, design reward systems and optimize multi-node and multi-datacenter training pipelines.
Responsibilities: Designing and implementing RL pipelines from reward modeling to policy optimization Optimizing RL training stability and sample efficiency for large models Verifying numerical correctness across inference and training Performance engineering on trainer-inference communication Validating methods from recent publications
Qualifications: Hands-on experience with reinforcement learning in production systems Deep understanding of policy-space methods (GRPO, PPO, etc.) Experience profiling distributed systems Preferred: History of OSS contributions Knowledge of TorchTitan and SGLang or vLLM NOUS RESEARCH ARTIFICIAL INTELLIGENCE MADE HUMAN ","library":"fa-solid"}}" data-widget_type="nav-menu.default"> Home Hermes Agent Nous Portal Psyche Hermes 4 Simulators Releases Careers Blog Shop Home Hermes Agent Nous Portal Psyche Hermes 4 Nous Chat Simulators Releases Careers Blog Shop NODES → HuggingFace → Discord → LinkedIn → Twitter → Email → GitHub → Careers NOUS RESEARCH THE AI ACCELERATOR COMPANY NODES → SIMULATORS → NOUS BLOG → HuggingFace → Discord → LinkedIn → Twitter → Email → GitHub
Total raised
$70.0M
Last stage
Series A
Investors
Jeffrey Quesnelle
CEO & Co-Founder
Karan Malhotra
Head of Behavior & Co-Founder
Shivani Mitra
Co-Founder
No applications, no recruiter spam. Just the intro.
A few questions to make sure this role is the right shape for you. Two minutes.
I write the intro, send it to the founder, and handle the back-and-forth.
If they’re a yes, I book the chat. You show up — that’s the whole job-hunt.