Goal: 99.99% uptime
We serve custom inference stacks that have irregular GPU load.
We're looking for people that have done genuinely amazing work in infrastructure that are interested in a challenge, working with both traditional infrastructure such as load balancers, NLB, etc., as well as very different infrastructure around inference engines and GPU loads.
This is a role that will inherently require deep experience with inference engines.
Contributions to vLLM, SGLang, trtllm, or inference frameworks a plus
Fast Apply - Edit Files Faster - 10,500 tok/sec WarpGrep - Fast Context - 5x faster agentic code search subagent. eliminates context rot Glance- Videos of AI testing your PR, embedded in GitHub
Salary
$130,000 - $185,000
Equity
1.5% - 1.5%
Location
San Francisco, CA, US
Experience
3+ years
Investors
Tejas Bhakta
Tejas Bhakta
LinkedInNo applications, no recruiter spam. Just the intro.
A few questions to make sure this role is the right shape for you. Two minutes.
I write the intro, send it to the founder, and handle the back-and-forth.
If they’re a yes, I book the chat. You show up — that’s the whole job-hunt.