HUD is building infrastructure to create RL training data and evals for frontier AI agents, as well as a marketplace to sell these to frontier labs through the HUD marketplace. Our platform is used by frontier labs, Fortune 500 companies, and startups. We’ve raised $15M from top VCs and were YC W25.
We're looking for research engineers to help build out QA for training data created by companies using HUD’s infrastructure. You’ll build the systems that scale quality to help us meet our continued strong demand.
You may be a good fit if you have:
Strong candidates may also:
We prioritize technical aptitude and learning potential over years of experience. Motivated candidates are encouraged to apply even if they don't meet all criteria.
Due to high volume, we may not actively respond to every application, but feel free to contact us if we missed your application!
HUD (YC W25) is developing agentic evals and RL environments for Computer Use Agents (CUAs) that browse the web for frontier AI labs. Our CUA Evals framework is the first comprehensive evaluation tool for CUAs.
People don't actually know if AI agents are working reliably. To make AI agents work in the real world, we need detailed evals for a huge range of tasks.
We're backed by Y Combinator, and work closely with frontier AI labs to provide agent evaluation and training infrastructure at scale.
Salary
$140,000 - $250,000
Location
San Francisco
Experience
3+ years
Total raised
$21.0M
Last stage
Seed
Investors
No applications, no recruiter spam. Just the intro.
A few questions to make sure this role is the right shape for you. Two minutes.
I write the intro, send it to the founder, and handle the back-and-forth.
Lorenss Martinsons
LinkedInIf they’re a yes, I book the chat. You show up — that’s the whole job-hunt.