Job Application for Site Reliability Engineer (SRE) at Thinking Machines Lab Back to jobs Site Reliability Engineer (SRE) San Francisco Apply Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals. We are scientists, engineers, and builders who’ve created some of the most widely used AI products, including ChatGPT and Character.ai, open-weights models like Mistral, as well as popular open source projects like PyTorch, OpenAI Gym, Fairseq, and Segment Anything. About Tinker Tinker is our fine-tuning API that empowers researchers and developers to customize frontier AI to their needs — opening access to capabilities that have previously been concentrated in a handful of labs. We manage the infrastructure while allowing Tinkerers full flexibility in training open weights models with their own data, algorithms, and for their own needs. Tinker is rapidly adding new customers, features, and novel use-cases. We’re hiring to grow the platform alongside the Tinker community.
About
the Role We're looking for a Site Reliability Engineer to drive the reliability of Tinker end-to-end. You'll work alongside the engineers building the platform and research teams to make every layer of the system more robust and resilient. What You’ll Do Define and own end-to-end reliability, from CI/CD flows to production observability and incident response. Develop appropriate Service Level Objectives for distributed training systems, balancing job completion reliability and scheduling latency with development velocity. Design and implement monitoring and observability across the full training path. Drive incident response for Tinker platform issues, ensuring rapid recovery, thorough incident reviews, and systematic improvements that prevent recurrence. Harden multi-tenant isolation and resource scheduling so that LoRA-based workload co-scheduling maximizes utilization without compromising reliability or data separation Collaborate with security teams to address production vulnerabilities Skills and
Qualifications Minimum
qualifications: Bachelor's degree or equivalent experience in computer science, engineering, or similar. Experience in distributed systems, cloud infrastructure, or site reliability engineering. Proficiency writing software to solve reliability problems, including building tooling and automation. Experience with production incident response, postmortems, and systematic reliability improvement. Strong communication skills and track record of coordination across engineering and research teams. Preferred
qualifications — we encourage you to apply if you meet some but not all of these: Deep experience operating production cloud services at scale (e.g., public cloud platforms, internal cloud services) Background in distributed training frameworks and how infrastructure failures surface in training behavior. Track record building checkpoint and recovery systems for long-running distributed jobs. Expertise in Kubernetes at scale: deploying, operating, debugging, and tuning clusters handling heterogeneous GPU workloads. Logistics Location: This role is based in San Francisco, California.
Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $350,000 – $475,000 USD. Visa sponsorship: We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the visa process together.
Benefits: Thinking Machines offers generous health, dental, and vision
benefits, unlimited PTO, paid parental leave, and relocation support as needed. As set forth in Thinking Machines' Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law. Create a Job Alert Interested in building your career at Thinking Machines Lab? Get future opportunities sent straight to your email. Create alert Apply for this job * indicates a required field Autofill with MyGreenhouse First Name * Last Name * Preferred First Name Email * Phone Country * Phone * Location (City) * Locate me Resume/CV * Attach Attach Dropbox Google Drive Enter manually Enter manually Accepted file types: pdf, doc, docx, txt, rtf Education School * Select... Degree * Select... Discipline * Select... Start date year End date year Add another LinkedIn Profile Link * Please provide the URL to your LinkedIn; if you don't have one, please write "none". Github Link * Please provide the URL to your Github; if you don't have one, please write "none". Personal Website About You * Please provide the URL to your personal website, Google Scholar, etc if you have one. Put "none" if you do not. Current Company * Please tell us the name of your current employer (today if you are employed). Put "none" if this does not apply to you; for example, if you are in school or not currently employed -- this does not disqualify you. Feel free to enter previous roles in the field below in "Past Company 1". Current Title or Role * Please enter your current title at your current employer. If you are not currently employed (or in school etc) please enter "none" and feel free to enter previous roles in the field below in "Past Company". Past Company 1 * Please enter the Company name of your most recent previous employer. If you have not worked at another company before your current one, please enter “none”. Past Company Title or Role * Please enter your title at your most recent previous employer. If you have not worked at another company before your current one, please enter “none”. Past Company 2 If you would like, please enter the Company name of your second previous employer. If you have not worked at another company before your current or previous one, please enter “none” or skip this question. Past Company Title or Role 2 If you would like, please enter the Title or Job of your second previous employer. If you have not worked at another company before your current or previous one, please enter “none” or skip this question. What type of software engineer are you? * Analytics Infrastructure Engineer Android Engineer API Engineer Backend Engineer Compute Infrastructure Engineer Data Engineer Data Infrastructure Engineer Developer Productivity Engineer Distributed Training Engineer Embedded Systems Engineer Frontend Engineer Fullstack Generalist Engineer Growth Engineer Engineer iOS Engineer Infrastructure Engineer Kernels Engineer Mobile Engineer Product Engineer Production Infrastructure Engineer Security Engineer Site Reliability Engineer Storage Infrastructure Engineer Supercomputing Engineer Other Select all that apply where you have actively completed work in and would be able to interview for in a technical interview. This will help us when picking between teams or projects! If you selected other, what areas did we not include that you have expertise in? * (Optional) List 3 projects you're proud of. Please list 3 projects you're proud of, using 1 sentence each. Feel free to add a link if helpful. We will send this to the hiring team when reviewing your application. Will you now or in the future require sponsorship for employment visa status in the United States? * Select... (Optional) Other notes Voluntary Self-Identification For government reporting purposes, we ask candidates to respond to the below self-identification survey. Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiring process or thereafter. Any information that you do provide will be recorded and maintained in a confidential file. As set forth in Thinking Machines Lab’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law. Gender Select... Are you Hispanic/Latino? Select... Race & Ethnicity Definitions If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection. As a government contractor subject to the Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measure the effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categories is as follows: A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to
compensation (or who but for the receipt of military retired pay would be entitled to
compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability. A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service. An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense. An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985. Veteran Status Select... Voluntary Self-Identification of Disability Form CC-305 Page 1 of 1 OMB Control Number 1250-0005 Expires 04/30/2026 Why are you being asked to complete this form? We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years. Completing this form is voluntary, and we hope that you will choose to do so. Your answer is confidential. No one who makes hiring decisions will see it. Your decision to complete the form and your answer will not harm you in any way. If you want to learn more about the law or this form, visit the U.S. Department of Labor’s Office of Federal Contract Compliance Programs (OFCCP) website at www.dol.gov/ofccp . How do you know if you have a disability? A disability is a condition that substantially limits one or more of your “major life activities.” If you have or have ever had such a condition, you are a person with a disability. Disabilities include, but are not limited to: Alcohol or other substance use disorder (not currently using drugs illegally) Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS Blind or low vision Cancer (past or present) Cardiovascular or heart disease Celiac disease Cerebral palsy Deaf or serious difficulty hearing Diabetes Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders Epilepsy or other seizure disorder Gastrointestinal disorders, for example, Crohn's Disease, irritable bowel syndrome Intellectual or developmental disability Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD Missing limbs or partially missing limbs Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports Nervous system condition, for example, migraine headaches, Parkinson’s disease, multiple sclerosis (MS) Neurodivergence, for example, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities Partial or complete paralysis (any cause) Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema Short stature (dwarfism) Traumatic brain injury Disability Status Select... PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete. Submit application Powered by Greenhouse
Salary
$350,000 - $475,000
Location
San Francisco
Total raised
$2.0B
Last stage
Seed
Investors
Mira Murati
Founder and CEO
John Schulman
Cofounder
Barret Zoph
VP of Research
No applications, no recruiter spam. Just the intro.
A few questions to make sure this role is the right shape for you. Two minutes.
I write the intro, send it to the founder, and handle the back-and-forth.
If they’re a yes, I book the chat. You show up — that’s the whole job-hunt.