CAST AI

Senior Machine Learning Engineer - AI Enabler Team

Remote

fulltime

Ready to apply? Let us help you stand out.

Apply with Standout Apply Directly

About the role

Senior ML Engineer - Kimchi (LLM Inference Optimization) - Careers @ Cast AI The 2026 State of Kubernetes Optimization Report is here 🎉 Download it to learn how teams are fighting waste. Platform Platform Application Performance Automation Platform AutoScaler For Karpenter Advanced Infrastructure optimization Kubernetes workload optimization Kubernetes cost monitoring GPU optimization – OMNI compute AI optimization LLM optimization for AIOps Application optimization Database optimization Solutions By industry Automotive Software & IT AI & ML Startups By cloud provider AWS GCP Azure Oracle Cloud Customers Pricing Resources DevOps Docs Environments Integrations Spot Instance Availability Map Community APA Hero program Captain program Slack community KubeAuto Day Learn Blog Case studies Webinars Events Reports Company About us Careers Newsroom Partner program Contact us Book a demo Sign in Get started Back to Careers Senior ML Engineer – Kimchi (LLM Inference Optimization) Austria; France; Germany; Italy; Netherlands; Poland; Spain; United Kingdom Apply for this role Why Cast AI? Cast AI is an automation platform that operates cloud-native and AI infrastructure at scale. By embedding autonomous decision-making directly into Kubernetes and cloud environments, Cast AI continuously optimizes performance, reliability, and efficiency in production. The old way doesn’t work. As Kubernetes and AI environments grow, manual decisions don’t. Cast AI replaces tickets, alerts, and manual tuning with continuous automation that adapts infrastructure as conditions change. Efficiency and cost savings follow naturally from that automation. Over 2,100 companies already rely on Cast AI, including Akamai, BMW, Cisco, FICO, HuggingFace, NielsenIQ, Swisscom, and TGS. Global team, diverse perspectives We’re headquartered in Miami, but our impact is international. We take a global and intentional approach to diversity. Today, Cast AI operates across 34 countries spanning Europe, North America, Latin America, and APAC, bringing a wide range of perspectives into how we build and lead. Unicorn momentum In January 2026, we achieved unicorn status with a strategic investment from Pacific Alliance Ventures, the corporate venture arm of Shinsegae Group (a $50+ billion Korean conglomerate). Our valuation now exceeds $1 billion, and we’re just getting started. Join us as we build the future of autonomous infrastructure. About the role Throughput. Latency. KV cache utilization. Move those three numbers in the right direction, and two things happen: customers get faster, cheaper inference, and our margins improve. That’s the entire thesis of this role. Every kernel you tune, every quantization scheme you ship, every scheduler tweak you land shows up directly in a customer’s p99 and on our P&L. This is a high-impact seat. It is also a high-autonomy seat as you’ll be given the room to lead the technical direction of inference optimization at Kimchi, not execute someone else’s roadmap. The problem: running LLMs in production is a moving target. The “right” model and serving configuration for a workload depend on traffic shape, sequence-length distribution, batch dynamics, GPU SKU, memory bandwidth, quantization tolerance, and a dozen other variables that shift week to week. Most teams pick a model once, over-provision GPUs, and absorb the cost. Kimchi is the system that makes that decision automatically – continuously matching workloads to the most cost-efficient, best-performing LLM and serving configuration on a customer’s infrastructure. We’re building the optimization layer between the model and the hardware, and we need engineers who understand both sides deeply. Stack Python · vLLM · SGLang · TensorRT-LLM · PyTorch · CUDA-adjacent tooling · Kubernetes · gRPC · ClickHouse · PostgreSQL · GCP Pub/Sub · AWS / GCP / Azure · GitLab CI · ArgoCD · Prometheus · Grafana · Loki · Tempo. Requirements: 5+ years building real ML systems, with a portfolio that shows depth in inference or training infrastructure (not just model training notebooks). Strong Python – production services, not scripts. Hands-on experience with at least one of vLLM, SGLang, or TensorRT-LLM, and a working mental model of why an inference engine performs the way it does on a given GPU. Fluency with quantization tradeoffs – you’ve measured quality regressions, not just compression ratios. Comfort with distributed systems: collective communication, sharding strategies, and the practical failure modes of multi-GPU and multi-node setups. A bias toward measurement. You instrument before you optimize, and you can tell the difference between a real win and a benchmark artifact. Self-direction. This role comes with a wide mandate; you should be excited by that, not unsettled by it. Responsibilities: Push throughput. Continuous batching, speculative decoding, chunked prefill, kernel-level tuning across vLLM, SGLang, and TensorRT-LLM. Find the ceiling on each GPU SKU, then raise it. Cut latency. Attack TTFT and TPOT separately. Profile, identify the actual bottleneck (compute, memory bandwidth, scheduling, networking), and fix it – not the bottleneck you assumed. Get more out of the KV cache. Paged attention, prefix caching, eviction policies, cache reuse across requests, quantized KV. This is where a lot of the unrealized throughput lives, and it’s an area you’ll own. Quantize without regressing quality. INT8, INT4, FP8 across weights, activations, and KV. Empirical work: measure quality on real workloads, not just perplexity benchmarks. Shrink cold starts and memory footprint. Faster init, smarter weight loading, tighter memory accounting – the difference between a model that scales and one that doesn’t. Scale across nodes. Distributed inference topologies, network-aware placement, checkpointing strategies that don’t bottleneck on storage or interconnect. Set the technical direction. Decide what we benchmark, what we adopt, and what we build ourselves. Bring the team along with strong writeups and reproducible experiments. What’s in it for you? Competitive salary (depending on the level of experience). Enjoy a flexible, remote-first global environment. Collaborate with a global team of cloud experts and innovators, passionate about pushing the boundaries of Kubernetes technology Equity options. Get quick feedback with a fast-paced workflow. Most feature projects are completed in 1 to 4 weeks. Spend 10% of your work time on personal projects or self-improvement. Learning budget for professional and personal development – including access to international conferences and courses that elevate your skills. Annual hackathon to spark new ideas and strengthen team bonds. Team-building budget and company events to connect with your colleagues. Equipment budget to ensure you have everything you need. Extra days off to help maintain a healthy work-life balance. Hiring process Screening call with Recruiter Hiring Manager interview Technical interview (system design) Live coding Culture Check interview with an executive *As part of our standard hiring process, we would like to inform you that a background check may be conducted at the final stage of recruitment through our third-party provider, Checkr. *Please note that Cast AI does not provide any form of visa sponsorship/work permit. #LI-Remote Cast AI is the leading Application Performance Automation platform, enabling customers to cut cloud costs, improve performance, and boost productivity. Facebook GitHub Slack Community LinkedIn X Solutions Kubernetes cluster optimization Kubernetes cost monitoring Kubernetes workload optimization LLM optimization for AIOps Database optimization OMNI Compute for AI Cast AI For Karpenter Resources Blog Events Webinars Reports Customer stories Documentation Release notes Pricing Company About us Careers Contact us Slack community Newsroom Brand assets Partner program APA Hero program Referral program © 2026 CAST AI Group Inc. Privacy policy Terms of service Customer data processing EU Projects Information security policy Book a demo See how Cast AI can transform your cloud-native operations and maximize Kubernetes cost savings. First name (Required) Last name (Required) Work email (Required) Job title (Required) Country (Required) Select country United States Canada Israel United Kingdom India Germany Albania Algeria Andorra Angola Anguilla Antarctica Antigua and Barbuda Argentina Armenia Aruba Australia Austria Azerbaijan Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin Bermuda Bhutan Bolivia Bonaire Bosnia and Herzegovina Botswana Bouvet Island Brazil British Indian Ocean Territory Brunei Darussalam Bulgaria Burkina Faso Burundi Cambodia Cameroon Cayman Islands Central African Republic Chad Chile China Christmas Island Cocos (Keeling) Islands Colombia Comoros Congo Congo, Cook Islands Costa Rica Croatia Cuba Curaçao Cyprus Czech Republic Côte d'Ivoire Denmark Djibouti Dominica Dominican Republic Ecuador Egypt El Salvador Equatorial Guinea Eritrea Estonia Ethiopia Falkland Islands (Malvinas) Faroe Islands Fiji Finland France French Guiana French Polynesia French Southern Territories Gabon Gambia Georgia Ghana Gibraltar Greece Greenland Grenada Guadeloupe Guam Guatemala Guernsey Guinea Guinea-Bissau Guyana Haiti Heard Island and McDonald Islands Holy See Honduras Hong Kong Hungary Iceland Indonesia Iran Iraq Ireland Isle of Man Italy Jamaica Japan Jersey Jordan Kazakhstan Kenya Kiribati Kuwait Kyrgyzstan Lao People's Democratic Republic Latvia Lebanon Lesotho Liberia Libya Liechtenstein Lithuania Luxembourg Macao Madagascar Malawi Malaysia Maldives Mali Malta Marshall Islands Martinique Mauritania Mauritius Mayotte Mexico Micronesia Moldova Monaco Mongolia Montenegro Montserrat Morocco Mozambique Myanmar Namibia Nauru Nepal Netherlands New Caledonia New Zealand Nicaragua Niger Nigeria Niue Norfolk Island Macedonia Norway Oman Pakistan Palau Palestine Panama Papua New Guinea Paraguay Peru Philippines Pitcairn Poland Portugal Qatar Romania Russian Federation Rwanda Réunion Saint Barthélemy Saint Helena, Ascension and Tristan da Cunha Saint Kitts and Nevis Saint Lucia Saint Martin (French part) Saint Pierre and Miquelon Saint Vincent and the Grenadines Samoa San Marino Sao Tome and Principe Saudi Arabia Senegal Serbia Seychelles Sierra Leone Singapore Sint Maarten (Dutch part) Slovakia Slovenia Solomon Islands Somalia South Africa South Georgia and the South Sandwich Islands South Korea South Sudan Spain Sri Lanka Sudan Suriname Svalbard and Jan Mayen Sweden Switzerland Syrian Arab Republic Taiwan Tajikistan Tanzania Thailand Timor-Leste Togo Tokelau Tonga Trinidad and Tobago Tunisia Turkmenistan Turks and Caicos Islands Tuvalu Turkey US Minor Outlying Islands Uganda Ukraine United Arab Emirates Uruguay Uzbekistan Vanuatu Venezuela Viet Nam Virgin Islands, British Wallis and Futuna Western Sahara Yemen Zambia Zimbabwe Aland Islands State (Required) Alabama Alaska American Samoa Arizona Arkansas California Colorado Connecticut Delaware District of Columbia Florida Georgia Guam Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Northern Mariana Islands Ohio Oklahoma Oregon Pennsylvania Puerto Rico Rhode Island South Carolina South Dakota Tennessee Texas Utah U.S. Virgin Islands Vermont Virginia Washington West Virginia Wisconsin Wyoming Armed Forces Americas Armed Forces Europe Armed Forces Pacific Canada State (Required) Alberta British Columbia Manitoba New Brunswick Newfoundland and Labrador Nova Scotia Ontario Quebec Saskatchewan India State (Required) Andaman and Nicobar Andhra Pradesh Arunachal Pradesh Assam Bihar Chandigarh Chhattisgarh Dadra and Nagar Haveli Delhi Goa Gujarat Haryana Himachal Pradesh Jammu and Kashmir Jharkhand Karnataka Kerala Ladakh Lakshadweep Madhya Pradesh Maharashtra Manipur Meghalaya Mizoram Nagaland Odisha Puducherry Punjab Rajasthan Sikkim Tamil Nadu Telangana Tripura Uttar Pradesh Uttarakhand West Bengal Daman and Diu Germany State (Required) Berlin Brandenburg Bremen Hamburg Hesse Mecklenburg-Vorpommern Lower Saxony North Rhine-Westphalia Saxony-Anhalt Saxony Schleswig-Holstein Thuringia Baden-Württemberg Bavaria Rhineland-Palatinate Saarland UK Location (Required) Aberdeen City Abergavenny Aberystwyth Abingdon Accrington Addlestone Alderley Edge Alnwick Altrincham Amersham Andover Ashby de la Zouch Ashford Aylesbury Ayr Bagshot Banbury Barnsley Basildon Basingstoke Bath Bedford Belfast Bescot Beverley Birmingham Blackpool Blaenau Gwent Bolton Borehamwood Bracknell Bradford Brentford Brentwood Bridgwater Brighton Bristol Burnley Bury Bury Saint Edmunds Caerdydd Camberley Cambridge Cardiff Carlisle Ceredigion Cheadle Chelmsford Cheltenham Chertsey Chester Chichester Chippenham Christchurch Cirencester Clifton Coalville Colchester Cossington Coventry Craigavon Crawley Crewe Cumbria Datchet Denham Derby Derbyshire Derry Didcot Diss Doncaster Dorchester Dundee Dunstable Durham Eastleigh Edinburgh Egham Elland Ely Enderby Epsom Ewell Exeter Farnborough Farnham Feltham Fleet Folkestone Frimley Glasgow Gloucester Gloucestershire Godalming Goole Goonhavern Goring-by-Sea Great Malvern Guildford Gurgaon Halton Hampshire Handforth Harpenden Harrogate Havant Haywards Heath Hemel Hempstead Hereford Hertford Hertfordshire High Wycombe Hinckley Hook Horley Horsforth Huddersfield Hungerford Huntingdon Hythe Inverness Ipswich Keele Kent Kingston upon Hull Knowsley District (B) Krakow Lancashire Lancaster Larbert Lasswade Leeds Leicester Leicestershire Leominster Lewes Lichfield Lincoln Liverpool Livingston Loughborough London Lowestoft Luton Maidenhead Maidstone Malmesbury Manchester Marlow Melton Mowbray Milton Keynes Mold Motherwell Nelson Newbury Newcastle upon Tyne Newport Newry Newtownabbey Normanton North Yorkshire Northampton Norwich Nottingham Oakham Oxford Oxfordshire Paisley Perth Peterborough Petersfield Pontypridd Poole Preston Purfleet Reading Redditch Redhill Reigate Renfrewshire Rickmansworth Rochdale Rode Royal Leamington Spa Royal Tunbridge Wells Royston Runcorn Saint Albans Sale Salford Sandwell Sevenoaks Sheffield Shirebrook Shrewsbury Skipton Slough Smethwick Snodland Solihull Somerset Southampton Southport Stafford Staffordshire Staines-upon-Thames Stevenage Stockton-on-Tees Stoke-on-Trent Stone Stroud Suffolk Sunderland Surrey Swindon Taunton Telford and Wrekin Thames Ditton Trafford Trowbridge Truro Twyford Wakefield Walsall Warrington Warwick Warwickshire Welwyn Garden City West Bromwich West End West Malling Weston-super-Mare Weybridge Winchester Windermere Windsor Witney Wokingham Worcester Worcestershire Yate York Which Kubernetes cloud services do you use? (Required) EKS GKE AKS OpenShift on AWS Migrating to Kubernetes How did you hear about us? (Required) By submitting this form, you acknowledge and agree that Cast AI will process your personal information in accordance with the Privacy Policy . This field is hidden when viewing the form UTM Source Current This field is hidden when viewing the form UTM Medium Current This field is hidden when viewing the form UTM Campaign Current This field is hidden when viewing the form UTM Term Current This field is hidden when viewing the form UTM Content Current This field is hidden when viewing the form Ref ID Current This field is hidden when viewing the form gclid Current This field is hidden when viewing the form Current URL

Senior Machine Learning Engineer - AI Enabler Team

About the role

Other roles at CAST AI

Job details

Company