1:1 mentoring with Big Tech AI engineers
Back to blog
12 min read

Loop Engineering — Stop Prompting, Start Designing the System

The shift from writing prompts turn-by-turn to designing autonomous loops that discover work, assign it, verify results, persist state, and know when to hand off to you.

loopsorchestrationclaude-codeagents

The shift from writing prompts turn-by-turn to designing autonomous loops that discover work, assign it, verify results, persist state, and know when to hand off to you.

The instinct when a coding agent gets good is to prompt it more — bigger tasks, sharper instructions, better examples. That works for a while and then hits a ceiling: the human is still the driver of every turn. Every task means someone opens the terminal, types the prompt, reads the output, types the next prompt. The agent got faster; the pipeline didn't.

Loop Engineering, a framing popularized by Cobus Greyling based on quotes from practitioners at Anthropic, OpenClaw, and elsewhere, names the next move: replace yourself as the prompter. Design a small autonomous system — the loop — that runs on a schedule or until a goal is met, discovers new work, spawns helper agents, verifies their output, persists state so it can pick up where it left off, and hands off to a human only when it hits something it can't decide on its own.

THREE PRACTITIONERS, ONE FRAMING
Boris Cherny (head of Claude Code, Anthropic): ‘I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.'

Peter Steinberger (creator of OpenClaw): ‘You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents.'

Addy Osmani: ‘Loop engineering is replacing yourself as the person who prompts the agent. You design the system that does it instead.'

Where it sits in the stack

Greyling positions Loop Engineering as the third layer of a progression each of which solves a distinct problem:

Context → Harness → Loop — each layer answers a different question
Three-layer stack — context engineering, harness engineering, loop engineering LOOP ENGINEERING Who prompts the agent, and when? Autonomous systems that discover, assign, verify, persist state, hand off. the system around the sessions HARNESS ENGINEERING What can the agent do in one session? Settings, permissions, hooks, MCP servers, subagents — the shell around the model. the shell around one session CONTEXT ENGINEERING What does the model see on this one turn? the message

Each layer up is a wider scope. Context engineering asks what a single prompt should carry. Harness engineering asks what a single session can reach, edit, and enforce. Loop engineering asks what happens when nobody is at the keyboard and the work still needs to move.

The six building blocks

Greyling identifies five capabilities plus one spine that together make a real loop — as opposed to a cron job that fires the same prompt into the void.

The loop stack — five capabilities around a durable memory spine
Six building blocks of a loop — scheduling, worktrees, skills, connectors, subagents, memory MEMORY the durable spine STATE.md LOOP-STATE.json SCHEDULING · the heartbeat /loop · /schedule · /goal · cron · GitHub Actions turns ‘I should check X every morning' into a fact WORKTREES safe parallel execution isolation: "worktree" no merge disasters SKILLS persistent project knowledge SKILL.md · CLAUDE.md so day-two isn't day-one CONNECTORS · MCP act on real systems Linear · GitHub · Slack · DB the common substrate SUBAGENTS maker / checker split implementer + verifier so ‘done' isn't self-graded HAND-OFF · when the loop stops state says ‘needs human' → notify → pause

1. Scheduling — the heartbeat

Without a schedule, a loop is a script you have to remember to run. With one — a cron entry, a GitHub Action, a Claude Code /loop, a Grok scheduler — the intention becomes durable. ‘I should triage new PRs every morning' turns into a fact, not a hope.

2. Worktrees — safe parallel execution

The moment a loop spawns two or three agents that all want to edit the codebase, isolated git worktrees are what stop them from writing over each other. Frameworks that let you say isolation: "worktree" per agent handle the setup and teardown; without it, one long-running loop with a merge conflict is a bad Tuesday.

3. Skills — persistent project knowledge

A skill file (SKILL.md, or Claude Code's CLAUDE.md) is what stops each loop run from being day-one. Conventions, build commands, standards, the project's ‘we do X here' all live externally so the agent picks them up automatically instead of re-inferring them from scratch every session.

4. Connectors — MCP is the substrate

Loops need to act on real systems: open a PR, update a Linear ticket, post to Slack, query a database, trigger a runbook. MCP has become the shared vocabulary. When your Zendesk MCP works with Claude Code, Grok, and Codex, the loop shape becomes portable across the tool you happen to use this month.

5. Sub-agents — the maker / checker split

The single most important pattern for loops that run unattended. One agent implements. A different agent — often stronger, always with different instructions — verifies against the spec, runs the tests, checks the acceptance criteria. The implementer doesn't get to grade its own homework. The verifier is what lets you walk away.

6. Memory — the durable spine

Good state files answer three questions on every read: what are we working on right now? what did we try last time and what happened? what is waiting for a human? Without state, the loop starts every run from zero, redoes finished work, forgets its own decisions. Files like STATE.md, LOOP-STATE.json, or a shared Linear board are the difference between a loop and a scheduled prompt.

Worked examples

Two loops in the wild — one from engineering, one not — showing how the six blocks combine into an actual working system.

Example 1 — Nightly PR triage loop (engineering)

A team of four wants incoming PRs from open-source contributors triaged overnight: label them, request changes where obvious, run the tests, and either merge trivially-safe ones or leave a clear summary for the human reviewer at 9 a.m. Currently one engineer does this in the first hour of every workday.

Turned into a loop: a GitHub Actions schedule runs a Claude Code session every night at 2 a.m. The session reads its STATE.md to see what was open at the last run, uses the GitHub MCP to fetch new/updated PRs, and spawns one worktree-isolated subagent per PR. Each subagent's SKILL.md tells it the project's conventions; the maker subagent proposes the review; a separate checker subagent verifies the review is factually consistent with the diff. Anything ambiguous gets flagged with a one-paragraph brief in the state file for the human. Everything else — labels, obvious change requests, safe merges — happens autonomously.

Nightly PR triage — schedule → worktree fan-out → maker/checker → state
Nightly PR triage — schedule, GitHub MCP, worktree-isolated subagents, checker, state file SCHEDULE GH Action cron: 0 2 * * * MEMORY read what was open? STATE.md GitHub MCP list new PRs since last run 7 PRs found fan out ↓ one per subagent WORKTREE-ISOLATED SUBAGENTS · maker / checker split per PR PR #421 · worktree A MAKER reads diff drafts review labels: docs CHECKER verifies vs diff runs tests ✓ merge safe PR #422 · worktree B MAKER reads diff drafts review labels: bug CHECKER verifies vs diff runs tests ✗ needs human PRs #423-#427 same shape · · · worktrees C-G in parallel MEMORY write · STATE.md updated 5 PRs auto-merged · 1 change-requested · 1 flagged for human at 9 a.m. Slack notification: ‘1 PR needs your review this morning.'

What changed: the engineer isn't reading through seven diffs at 9 a.m.; they read one summary. The loop discovered the work (GitHub MCP), assigned it (worktree-isolated subagents), verified (checker), persisted (STATE.md), and handed off exactly where it should. Six of the six blocks, all in play.

Example 2 — Weekly competitor watch loop (non-technical)

A product manager wants a Monday-morning briefing on what competitors shipped last week — new features, pricing changes, blog posts, notable hires visible on LinkedIn. Doing this manually eats three hours every week.

Turned into a loop: a scheduler runs a session every Sunday at 6 p.m. The state file holds the current competitor list, the last-seen version of each competitor's pricing page, and any items waiting for the human to acknowledge. The session pulls the RSS feed of each competitor's blog (a simple MCP), diffs the pricing pages (another MCP that stores a snapshot), and scans LinkedIn for changes at the target companies. A maker subagent writes the draft brief; a checker subagent verifies every claim has a URL and every diff quote is actually present on the linked page. Monday at 9 a.m., the brief is in the PM's inbox with the state file updated.

Weekly competitor watch — same shape, different domain
Weekly competitor watch loop — schedule, state, connectors, maker/checker, hand-off SCHEDULE Sunday 6 p.m. weekly cron STATE read 12 competitors last snapshots CONNECTORS · three MCPs pulling fresh signal · rss-feeds MCP → 12 blogs · page-snapshot MCP → diffs · linkedin-search MCP → hiring signal at each company MAKER + CHECKER · one loop, two roles MAKER · draft the brief • Anthropic shipped Skills API • OpenAI dropped pricing 12% • Perplexity announced Enterprise SSO • Cursor hired 3 infra engineers • 6 competitor blog posts (linked) • No pricing changes at 8 others CHECKER · verify every claim ✓ Skills API — URL live, dated Fri ✓ pricing diff — page-snapshot matches ✓ Enterprise SSO — press release exists ! Cursor hires — could not verify #3 ✓ 6/6 blog URLs load, dates match → 1 item removed, 4 confirmed STATE write + HAND-OFF snapshots updated · brief queued for Monday 9 a.m. delivery Human sees a verified 4-item digest, not 12 blog feeds and a spreadsheet.

The pattern is identical to the PR triage — schedule, state, connectors, maker/checker, hand-off — but the domain is completely different. That's the point: the loop shape is portable. Once you've built one, the second one is mostly filling in different connectors and a different skill file.

The realities

Greyling is candid about the traps, and they're worth naming before you build.

Cost economics

A five-minute loop that spawns implementer + verifier on every run will burn through a limited plan before breakfast. Triage should be cheap; subagents should only spawn when the state file indicates something actionable. Log the token spend per run — if you can't see it, you can't control it.

Verification is a claim, not proof

‘Done' is a claim until you confirm it. The checker subagent buys confidence — it doesn't remove the need for a human to periodically audit the loop's decisions. Unattended loops make unattended mistakes; the audit is what catches them.

Comprehension debt

Loops let you ship faster than you understand. The gap between what exists and what you can explain grows silently. A monthly ‘what has the loop actually done?' review, done by a human reading the state file top-to-bottom, is the cheapest fix.

Cognitive surrender

This is Greyling's most pointed warning. The same loop that accelerates a good engineer lets a less-experienced one abdicate judgment. ‘Build the loop like someone who intends to stay the engineer — not just the person who presses go.'

WATCH FOR: A loop that never hands off to a human. If your state file has never grown a ‘needs review' item in two weeks, the loop is either wildly conservative (accepting nothing risky) or wildly permissive (accepting everything). Both mean you've stopped being in the loop.

Tool convergence

The reason loop engineering is worth learning as a shape and not as a specific product feature: Claude Code, Grok CLI, and OpenAI Codex have all landed on similar primitives. Scheduling, worktrees, skills, MCP, subagents, memory files — the vocabulary is now roughly shared. A loop you design against Claude Code today ports to whichever tool you use next quarter with mostly a change of syntax, not shape.

KEY INSIGHT (via Greyling): Loop engineering isn't ‘a bigger prompt' — it's a system that discovers, assigns, verifies, persists and knows when to hand off to you. Direct prompting still works and is often correct. But the leverage point has moved: from crafting one better message to designing the small autonomous system that decides when a message needs to be sent at all.

Enjoyed this post? The full curriculum has 87+ sections, system design problems, and AI-reviewed practice runs.

See the full guide