About the role

Senior AI Engineer - Harness Engineering (Kimchi) - Careers @ Cast AI Skip to content Platform Infrastructure Optimization Cost Optimization Workload Optimization Karpenter Cluster Optimization Database Optimization Infrastructure Observation Cost Monitoring GPU Optimization Cross-cloud GPU Access Enterprise AI Coding Token Optimization Enterprise AI Inference Enterprise Agentic Coding Where do you run Kubernetes? AWS GCP Azure Oracle Cloud Application Performance Automation Official APA® Platform How it works Integrations Environments Customers Customers Cybersecurity DevOps E-Commerce Financial Services Industries Gaming & Entertainment Pharmaceutical SaaS All Case Studies Industries Automotive Software & IT AI & ML Setups Transform your cloud-native operations and maximize Kubernetes cost savings Validate Cast AI Get answers Learn about our advanced features Book a Demo Pricing Resources Get Started Documentation Supported Environments Integrations Spot Instance Availability Map Learn Blog Automation Academy Reports Webinars Join the community APA Hero Program Captain Program Slack Community KubeAuto Day Company CAST AI

About Us Newsroom Events Let’s Work Together Careers Partner Program Referral Program Media Brand Assets Contact us Book a demo Sign in Get started Back to Careers Senior AI Engineer – Harness Engineering (Kimchi) Bulgaria; Croatia; Estonia; Greece; Hungary; Latvia; Lithuania; Poland; Romania; Slovakia; Slovenia; Ukraine Apply for this role Why Kimchi? Kimchi is the AI platform inside CAST AI. We started by helping companies run LLMs on their own Kubernetes clusters – now we’re building the execution layer where agents do real work. Our Infrastructure today : multi-model inference (MiniMax, Kimi, GLM-5, Nemotron, DeepSeek) with intelligent routing, an OpenAI-compatible API, and deployment flexibility from our GPUs to your VPC. The inference layer is the foundation. What we’re hiring for sits on top of it: coding agents, agent runtimes, orchestration systems, and the reliability engineering that makes them actually finish things. Tech Stack: TypeScript, Go, Kubernetes, AWS/GCP/Azure, MCP, Prometheus/Grafana/Loki, GitLab CI, ArgoCD. Why harness engineering matters here OpenAI and Anthropic ship models. They also ship one harness each – the scaffolding that turns a raw model into something that can plan, execute, recover, and complete work. We ship a different kind of harness: one built for cost-conscious, long-horizon autonomy, running on inference infrastructure we control end-to-end. A decent model with a great harness beats a great model with a bad harness. We’ve watched this play out. The gap between what today’s models can do and what you see them doing is largely a harness gap – and that gap is where we operate. What you’ll build The ratchet. Every time our agent makes a mistake, we engineer a solution so it never makes that mistake again. That means hooks that enforce constraints the model “knows” but forgets: pre-commit lint checks, permission gates, context compaction before the window fills. Success is silent, failures are verbose. Long-horizon execution. Our harness is built around spec-driven autonomy: meta-prompting, fresh context per task, worktree-per-slice git strategy, automatic replanning, crash recovery, stuck detection. We’re implementing Ralph loops – when the model tries to exit, we intercept and reinject the goal into a fresh context. The agent reads state from disk and continues. Multi-session, multi-day work, without context rot. Planner/executor splits. Planning with a reasoning model, executing with a fast one, evaluating with a third. Separating generation from evaluation beats self-verification because agents reliably skew positive when grading their own work. The harness surface. CLI, TUI, MCP integration, sandboxed execution, telemetry. Our AGENTS.md is short – every line traces to a specific thing that went wrong. TypeScript on the surface, Go where it matters. Memory and context. Moving agents off laptops, giving them state that survives across sessions, managing context so information lands where it’s actionable. Compaction, tool-call offloading, progressive skill disclosure. What makes this different (with receipts) You’ve seen the pitch: “we route to the best model.” Everyone says that. Here’s what we actually have: GPU infrastructure we own. Not just an API reseller. From GPU placement across clouds to the inference endpoint your agent calls – we control the cost curve. A harness-first thesis. We treat agent failures as configuration problems, not model problems. When we moved from a stock harness to our own, completion rates on internal benchmarks improved by 40%+ on the same model. Agents.md that earns every line. No brainstormed rules – every constraint in our system prompt traces to a real failure we saw and fixed.

Senior AI Engineer - Harness Engineering (Kimchi)

About the role

Other roles at CAST AI

Job details

Company

Funding

What happens next.

Confirm the fit

I pitch you to the company

A meeting lands on your calendar