TLDR

LiteLLM is an open-source LLM Gateway with 28K+ stars on GitHub and trusted by companies like NASA, Rocket Money, Samsara, Lemonade, and Adobe. We’re rapidly expanding and seeking a performance engineer to help scale the platform to handle 5K RPS (Requests per second). We’re based in San Francisco.

What is LiteLLM

LiteLLM provides an open source Python SDK and Python FastAPI Server that allows calling 100+ LLM APIs (Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic) in the OpenAI format

We just hit $2.5M ARR and have raised a $1.6M seed round from Y Combinator, Gravity Fund and Pioneer Fund. You can find more information on our website, Github and Technical Documentation.

About the Role

We're hiring a Python performance engineer to own maximizing throughput, minimizing latency and ensuring our platform is reliable in production.

Roadmap for Performance Engineer:

By end of this year our RPS and latency overhead should be at parity with industry benchmarks. Cover stream + non-stream for /chat/completions, /completions, /embeddings, /realtime, /audio/transcriptions
- Reduce e2e overhead latency for cache misses. Currently at 100ms-500ms - ensure we meet industry standards.
- Reduce e2e overhead latency for cache hits - ensure we meet industry benchmarks.
- Ensure overhead latency scales well when other components are added to the platform - e.g Redis, Redis Cluster, DB, Non-Admin Virtual Keys
- Ensure overhead latency scales well with payload size - 1MB prompt with streaming should be sub 100ms

Backend Performance Engineer

About the role

TLDR

What is LiteLLM

About the Role

About LiteLLM

Required skills

Other roles at LiteLLM

Job details

Company

Funding

Founders

What happens next.

Confirm the fit

I pitch you to the company

A meeting lands on your calendar