About the role

The intern will design, evaluate and bring to the state of the art the internal Windmill agentic loop for generating scripts, flows and full-stack apps - and build the benchmarking system that measures its progress. The work tackles several open questions: how to objectively evaluate a generated workflow or app beyond "it compiles" (functional tests, end-to-end execution, UX quality, semantic correctness); how an agent should decompose a natural-language specification into coherent atomic steps; how to efficiently inject Windmill-specific context (hub, types, resource schemas) without saturating the context window; how to exploit execution feedback for self-correction; how to keep a dependency graph of scripts, flows and apps coherent across iterative multi-file edits; and how to detect hallucinations, silent regressions and "fake successes" where tests pass for the wrong reasons.

Expected deliverables: the Windmill benchmark (corpus, harness, tracking dashboard); an improved agentic loop shipped to production with documented progression metrics; a weekly lab notebook; the final thesis report; and possibly a publication or open-source release. The intern works directly with Ruben Fiszel (co-founder & CEO) and the Windmill R&D / AI team, with daily interaction, weekly reviews and full access to the codebase, to anonymized usage data, to frontier-model API budgets and to GPU infrastructure for fine-tuning experiments.

State of the art

Code-generation agents:

Inline assistants: Copilot, Cursor, Codeium - local completion and editing, short context
Autonomous agents: Claude Code, Aider, SWE-agent, OpenHands, Devin - planning, execution, self-correction
RL / fine-tuning approaches: AgentCoder, Reflexion, Self-Refine, agent tuning on execution traces
Retrieval methods: RAG over documentation, code embeddings, graph-RAG

Windmill is an open-source developer platform that turns scripts into workflows, internal tools, and full-stack apps.

Write scripts in Python, TypeScript, Go, Bash, Rust, SQL - Windmill auto-generates UIs from their parameters, handles dependencies, credentials, permissions, and scheduling so you focus on business logic, not infra.

Open-source alternative to Airflow, Temporal, Retool and n8n. Chain scripts into flows with branching, parallelism, retries. Build dashboards with the app builder. Trigger via cron, webhook, or UI. All-in-one runtime, editor, secret manager, and OAuth platform - enterprise-ready out of the box.

Stack: Rust / TypeScript + Svelte / PostgreSQL. Self-hostable, easy to deploy, built for performance and DX.

Agentic Code-Generation Loop Research Intern

About the role

State of the art

Work plan (5–6 months)

Who we're looking for

About Windmill

Required skills

Other roles at Windmill

Job details

Company

Funding

Founders

What happens next.

Confirm the fit

I pitch you to the company

A meeting lands on your calendar