About the role

About SF Tensor

At The San Francisco Tensor Company, we believe the future of AI and high-performance computing depends on rethinking the entire software and infrastructure stack. Today's developers face bottlenecks across hardware, cloud, and code optimization that slow progress before ideas can reach their full potential. Our mission is to remove those barriers and make compute faster, cheaper, and universally portable.

We are building a Kernel Optimizer that automatically transforms code into its most efficient form, combined with Tensor Cloud for adaptive, cross-cloud compute and Emma Lang, a new programming language for high-performance, hardware-aware computation. Together, these technologies reinvent the foundations of AI and HPC.

SF Tensor is proudly backed by Susa Ventures and Y Combinator, as well as a group of angels including Max Mullen and Paul Graham as well as founders and executives of NeuraLink, Notion and AMD. We are partnering with researchers, engineers, and organizations who share our belief that the next breakthroughs in AI require breakthroughs in compute.

About the Role

We're hiring a Founding GPU Compiler Engineer to build the core compilation infrastructure for our AI compiler. That means taking models from PyTorch, JAX, and TensorFlow and turning them into highly optimized binaries for large-scale AI pre-training.

You'll own the entire compiler stack, from ingesting StableHLO all the way to backend code generation, and you'll work across targets like NVIDIA, AMD, Trainium, and TPU. You'll help shape our architecture, tooling, and overall engineering culture from the very beginning.

What You'll Do

Design and implement the main compilation pipeline, from StableHLO to executable GPU and host binaries

AI researchers should be pushing the boundaries of what's possible with new architectures and training methods. Instead, they waste weeks configuring cloud infrastructure, debugging distributed systems, and optimizing their GPU code. We know because we lived it: While training our own models across thousands of GPUs earlier this year, we spent more time fighting our infrastructure than doing actual research.

That's why we're building two things. First, Elastic Cloud: a managed platform that automatically finds the cheapest GPUs across all providers, handles spot instance preemption, and cuts compute costs by up to 80%. Second, automatic kernel optimization that makes training code run faster by modeling hardware topology, often beating hand-tuned implementations.

The problem is that getting high performance across different hardware is genuinely hard. NVIDIA's CUDA moat exists because writing fast kernels requires deep expertise. Most teams either accept vendor lock-in or hire expensive kernel engineers. Our goal is to break the CUDA moat.

The compute bottleneck is the biggest constraint on AI progress. NVIDIA can't manufacture enough GPUs, and their monopoly keeps prices astronomical. Meanwhile, AMD, Google, and Amazon are shipping capable alternative hardware that nobody uses because the software is too hard. We're breaking that moat. If we succeed, anyone will be able to train state-of-the-art models without thinking past their PyTorch code.

Founding GPU Compiler Engineer

About the role

About SF Tensor

About the Role

What You'll Do

What We're Looking For

Nice to Have

Why Join Us

About SF Tensor

Required skills

Other roles at SF Tensor

Job details

Company

Funding

Founders

What happens next.

Confirm the fit

I pitch you to the company

A meeting lands on your calendar