Senior AI Engineer - Harness Engineering (Kimchi) - Careers @ Cast AI The 2026 State of Kubernetes Optimization Report is here 🎉 Download it to learn how teams are fighting waste. Platform Platform Application Performance Automation Platform AutoScaler For Karpenter Advanced Infrastructure optimization Kubernetes workload optimization Kubernetes cost monitoring GPU optimization – OMNI compute AI optimization LLM optimization for AIOps Application optimization Database optimization Solutions By industry Automotive Software & IT AI & ML Startups By cloud provider AWS GCP Azure Oracle Cloud Customers Pricing Resources DevOps Docs Environments Integrations Spot Instance Availability Map Community APA Hero program Captain program Slack community KubeAuto Day Learn Blog Case studies Webinars Events Reports Company About us Careers Newsroom Partner program Contact us Book a demo Sign in Get started Back to Careers Senior AI Engineer – Harness Engineering (Kimchi) Bulgaria; Croatia; Estonia; Greece; Hungary; Latvia; Lithuania; Poland; Romania; Slovakia; Slovenia; Ukraine Apply for this role Why Kimchi? Kimchi is the AI platform inside CAST AI. We started by helping companies run LLMs on their own Kubernetes clusters – now we’re building the execution layer where agents do real work. Our Infrastructure today : multi-model inference (MiniMax, Kimi, GLM-5, Nemotron, DeepSeek) with intelligent routing, an OpenAI-compatible API, and deployment flexibility from our GPUs to your VPC. The inference layer is the foundation. What we’re hiring for sits on top of it: coding agents, agent runtimes, orchestration systems, and the reliability engineering that makes them actually finish things. Tech Stack: TypeScript, Go, Kubernetes, AWS/GCP/Azure, MCP, Prometheus/Grafana/Loki, GitLab CI, ArgoCD. Why harness engineering matters here OpenAI and Anthropic ship models. They also ship one harness each – the scaffolding that turns a raw model into something that can plan, execute, recover, and complete work. We ship a different kind of harness: one built for cost-conscious, long-horizon autonomy, running on inference infrastructure we control end-to-end. A decent model with a great harness beats a great model with a bad harness. We’ve watched this play out. The gap between what today’s models can do and what you see them doing is largely a harness gap – and that gap is where we operate. What you’ll build The ratchet. Every time our agent makes a mistake, we engineer a solution so it never makes that mistake again. That means hooks that enforce constraints the model “knows” but forgets: pre-commit lint checks, permission gates, context compaction before the window fills. Success is silent, failures are verbose. Long-horizon execution. Our harness is built around spec-driven autonomy: meta-prompting, fresh context per task, worktree-per-slice git strategy, automatic replanning, crash recovery, stuck detection. We’re implementing Ralph loops – when the model tries to exit, we intercept and reinject the goal into a fresh context. The agent reads state from disk and continues. Multi-session, multi-day work, without context rot. Planner/executor splits. Planning with a reasoning model, executing with a fast one, evaluating with a third. Separating generation from evaluation beats self-verification because agents reliably skew positive when grading their own work. The harness surface. CLI, TUI, MCP integration, sandboxed execution, telemetry. Our AGENTS.md is short – every line traces to a specific thing that went wrong. TypeScript on the surface, Go where it matters. Memory and context. Moving agents off laptops, giving them state that survives across sessions, managing context so information lands where it’s actionable. Compaction, tool-call offloading, progressive skill disclosure. What makes this different (with receipts) You’ve seen the pitch: “we route to the best model.” Everyone says that. Here’s what we actually have: GPU infrastructure we own. Not just an API reseller. From GPU placement across clouds to the inference endpoint your agent calls – we control the cost curve. A harness-first thesis. We treat agent failures as configuration problems, not model problems. When we moved from a stock harness to our own, completion rates on internal benchmarks improved by 40%+ on the same model. Agents.md that earns every line. No brainstormed rules – every constraint in our system prompt traces to a real failure we saw and fixed. Requirements: You’ve used AI coding agents in anger. Not demos – real work. You have opinions about Claude Code, Codex, OpenCode, Cursor. You know what it feels like when an agent gets stuck and why. Strong TypeScript or Go in production. Comfort moving between them. Our surface is TypeScript; our core is Go. You think in harness terms. You read “the agent hallucinated” and your first instinct is to ask what context it was missing, what hook should have caught it, what constraint should exist. You drive features end-to-end. Design → build → ship → measure → iterate. We don’t have layers that absorb ambiguity for you. Responsibilities: Build and evolve the agent harness – ship hooks, permission gates, and context compaction. Every AGENTS.md constraint traces to a failure you personally diagnosed. Own long-horizon execution – multi-session task completion via spec-driven prompting, worktree-per-slice git, Ralph loop recovery, and stuck detection. Completion rate is your metric. Architect planner/executor/evaluator pipelines – planning with a reasoning model, execution with a fast one, evaluation with a third. No self-verification. Manage agent memory and context – state persistence across sessions, context compaction, tool-call offloading. Zero context rot on multi-day work. Own the harness surface – CLI, TUI, MCP integrations, sandboxed execution, telemetry. TypeScript on the surface, Go where it matters. What success looks like ( after 6 months): You’ve shipped at least one major harness feature end-to-end: designed it, built it, measured it, iterated. You’ve added constraints to our AGENTS.md based on failures you personally observed and diagnosed. You’ve improved a measurable reliability metric – completion rate, context efficiency, or cost per successful task. You’ve formed strong opinions about where our harness is load-bearing and where it’s dead weight. What’s in it for you? Competitive salary (€6,500 – €9,000 gross, depending on the level of experience). Enjoy a flexible, remote-first global environment. Collaborate with a global team of cloud experts and innovators, passionate about pushing the boundaries of Kubernetes technology Equity options. Get quick feedback with a fast-paced workflow. Most feature projects are completed in 1 to 4 weeks. Spend 10% of your work time on personal projects or self-improvement. Learning budget for professional and personal development – including access to international conferences and courses that elevate your skills. Annual hackathon to spark new ideas and strengthen team bonds. Team-building budget and company events to connect with your colleagues. Equipment budget to ensure you have everything you need. Extra days off to help maintain a healthy work-life balance. This is a location-specific opportunity. We are currently accepting applications from candidates residing in the following European countries: Bulgaria, Croatia, Estonia, Greece, Hungary, Latvia, Lithuania, Poland, Romania, Slovakia, Slovenia, and Ukraine. *As part of our standard hiring process, we would like to inform you that a background check may be conducted at the final stage of recruitment through our third-party provider, Checkr. *Please note that Cast AI does not provide any form of visa sponsorship/work permit. #LI-Remote Cast AI is the leading Application Performance Automation platform, enabling customers to cut cloud costs, improve performance, and boost productivity. Facebook GitHub Slack Community LinkedIn X Solutions Kubernetes cluster optimization Kubernetes cost monitoring Kubernetes workload optimization LLM optimization for AIOps Database optimization OMNI Compute for AI Cast AI For Karpenter Resources Blog Events Webinars Reports Customer stories Documentation Release notes Pricing Company About us Careers Contact us Slack community Newsroom Brand assets Partner program APA Hero program Referral program © 2026 CAST AI Group Inc. Privacy policy Terms of service Customer data processing EU Projects Information security policy Book a demo See how Cast AI can transform your cloud-native operations and maximize Kubernetes cost savings. First name (Required) Last name (Required) Work email (Required) Job title (Required) Country (Required) Select country United States Canada Israel United Kingdom India Germany Albania Algeria Andorra Angola Anguilla Antarctica Antigua and Barbuda Argentina Armenia Aruba Australia Austria Azerbaijan Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin Bermuda Bhutan Bolivia Bonaire Bosnia and Herzegovina Botswana Bouvet Island Brazil British Indian Ocean Territory Brunei Darussalam Bulgaria Burkina Faso Burundi Cambodia Cameroon Cayman Islands Central African Republic Chad Chile China Christmas Island Cocos (Keeling) Islands Colombia Comoros Congo Congo, Cook Islands Costa Rica Croatia Cuba Curaçao Cyprus Czech Republic Côte d'Ivoire Denmark Djibouti Dominica Dominican Republic Ecuador Egypt El Salvador Equatorial Guinea Eritrea Estonia Ethiopia Falkland Islands (Malvinas) Faroe Islands Fiji Finland France French Guiana French Polynesia French Southern Territories Gabon Gambia Georgia Ghana Gibraltar Greece Greenland Grenada Guadeloupe Guam Guatemala Guernsey Guinea Guinea-Bissau Guyana Haiti Heard Island and McDonald Islands Holy See Honduras Hong Kong Hungary Iceland Indonesia Iran Iraq Ireland Isle of Man Italy Jamaica Japan Jersey Jordan Kazakhstan Kenya Kiribati Kuwait Kyrgyzstan Lao People's Democratic Republic Latvia Lebanon Lesotho Liberia Libya Liechtenstein Lithuania Luxembourg Macao Madagascar Malawi Malaysia Maldives Mali Malta Marshall Islands Martinique Mauritania Mauritius Mayotte Mexico Micronesia Moldova Monaco Mongolia Montenegro Montserrat Morocco Mozambique Myanmar Namibia Nauru Nepal Netherlands New Caledonia New Zealand Nicaragua Niger Nigeria Niue Norfolk Island Macedonia Norway Oman Pakistan Palau Palestine Panama Papua New Guinea Paraguay Peru Philippines Pitcairn Poland Portugal Qatar Romania Russian Federation Rwanda Réunion Saint Barthélemy Saint Helena, Ascension and Tristan da Cunha Saint Kitts and Nevis Saint Lucia Saint Martin (French part) Saint Pierre and Miquelon Saint Vincent and the Grenadines Samoa San Marino Sao Tome and Principe Saudi Arabia Senegal Serbia Seychelles Sierra Leone Singapore Sint Maarten (Dutch part) Slovakia Slovenia Solomon Islands Somalia South Africa South Georgia and the South Sandwich Islands South Korea South Sudan Spain Sri Lanka Sudan Suriname Svalbard and Jan Mayen Sweden Switzerland Syrian Arab Republic Taiwan Tajikistan Tanzania Thailand Timor-Leste Togo Tokelau Tonga Trinidad and Tobago Tunisia Turkmenistan Turks and Caicos Islands Tuvalu Turkey US Minor Outlying Islands Uganda Ukraine United Arab Emirates Uruguay Uzbekistan Vanuatu Venezuela Viet Nam Virgin Islands, British Wallis and Futuna Western Sahara Yemen Zambia Zimbabwe Aland Islands State (Required) Alabama Alaska American Samoa Arizona Arkansas California Colorado Connecticut Delaware District of Columbia Florida Georgia Guam Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Northern Mariana Islands Ohio Oklahoma Oregon Pennsylvania Puerto Rico Rhode Island South Carolina South Dakota Tennessee Texas Utah U.S. Virgin Islands Vermont Virginia Washington West Virginia Wisconsin Wyoming Armed Forces Americas Armed Forces Europe Armed Forces Pacific Canada State (Required) Alberta British Columbia Manitoba New Brunswick Newfoundland and Labrador Nova Scotia Ontario Quebec Saskatchewan India State (Required) Andaman and Nicobar Andhra Pradesh Arunachal Pradesh Assam Bihar Chandigarh Chhattisgarh Dadra and Nagar Haveli Delhi Goa Gujarat Haryana Himachal Pradesh Jammu and Kashmir Jharkhand Karnataka Kerala Ladakh Lakshadweep Madhya Pradesh Maharashtra Manipur Meghalaya Mizoram Nagaland Odisha Puducherry Punjab Rajasthan Sikkim Tamil Nadu Telangana Tripura Uttar Pradesh Uttarakhand West Bengal Daman and Diu Germany State (Required) Berlin Brandenburg Bremen Hamburg Hesse Mecklenburg-Vorpommern Lower Saxony North Rhine-Westphalia Saxony-Anhalt Saxony Schleswig-Holstein Thuringia Baden-Württemberg Bavaria Rhineland-Palatinate Saarland UK Location (Required) Aberdeen City Abergavenny Aberystwyth Abingdon Accrington Addlestone Alderley Edge Alnwick Altrincham Amersham Andover Ashby de la Zouch Ashford Aylesbury Ayr Bagshot Banbury Barnsley Basildon Basingstoke Bath Bedford Belfast Bescot Beverley Birmingham Blackpool Blaenau Gwent Bolton Borehamwood Bracknell Bradford Brentford Brentwood Bridgwater Brighton Bristol Burnley Bury Bury Saint Edmunds Caerdydd Camberley Cambridge Cardiff Carlisle Ceredigion Cheadle Chelmsford Cheltenham Chertsey Chester Chichester Chippenham Christchurch Cirencester Clifton Coalville Colchester Cossington Coventry Craigavon Crawley Crewe Cumbria Datchet Denham Derby Derbyshire Derry Didcot Diss Doncaster Dorchester Dundee Dunstable Durham Eastleigh Edinburgh Egham Elland Ely Enderby Epsom Ewell Exeter Farnborough Farnham Feltham Fleet Folkestone Frimley Glasgow Gloucester Gloucestershire Godalming Goole Goonhavern Goring-by-Sea Great Malvern Guildford Gurgaon Halton Hampshire Handforth Harpenden Harrogate Havant Haywards Heath Hemel Hempstead Hereford Hertford Hertfordshire High Wycombe Hinckley Hook Horley Horsforth Huddersfield Hungerford Huntingdon Hythe Inverness Ipswich Keele Kent Kingston upon Hull Knowsley District (B) Krakow Lancashire Lancaster Larbert Lasswade Leeds Leicester Leicestershire Leominster Lewes Lichfield Lincoln Liverpool Livingston Loughborough London Lowestoft Luton Maidenhead Maidstone Malmesbury Manchester Marlow Melton Mowbray Milton Keynes Mold Motherwell Nelson Newbury Newcastle upon Tyne Newport Newry Newtownabbey Normanton North Yorkshire Northampton Norwich Nottingham Oakham Oxford Oxfordshire Paisley Perth Peterborough Petersfield Pontypridd Poole Preston Purfleet Reading Redditch Redhill Reigate Renfrewshire Rickmansworth Rochdale Rode Royal Leamington Spa Royal Tunbridge Wells Royston Runcorn Saint Albans Sale Salford Sandwell Sevenoaks Sheffield Shirebrook Shrewsbury Skipton Slough Smethwick Snodland Solihull Somerset Southampton Southport Stafford Staffordshire Staines-upon-Thames Stevenage Stockton-on-Tees Stoke-on-Trent Stone Stroud Suffolk Sunderland Surrey Swindon Taunton Telford and Wrekin Thames Ditton Trafford Trowbridge Truro Twyford Wakefield Walsall Warrington Warwick Warwickshire Welwyn Garden City West Bromwich West End West Malling Weston-super-Mare Weybridge Winchester Windermere Windsor Witney Wokingham Worcester Worcestershire Yate York Which Kubernetes cloud services do you use? (Required) EKS GKE AKS OpenShift on AWS Migrating to Kubernetes How did you hear about us? (Required) By submitting this form, you acknowledge and agree that Cast AI will process your personal information in accordance with the Privacy Policy . This field is hidden when viewing the form UTM Source Current This field is hidden when viewing the form UTM Medium Current This field is hidden when viewing the form UTM Campaign Current This field is hidden when viewing the form UTM Term Current This field is hidden when viewing the form UTM Content Current This field is hidden when viewing the form Ref ID Current This field is hidden when viewing the form gclid Current This field is hidden when viewing the form Current URL
Location
Remote