The Agent Loop

Build a complete tool-calling AI agent in 15 lines of Python. Understand the core agent loop pattern that powers all LLM agents.

Last updated 2026-06-12

The Agent Loop: Think → Act → Observe

Strip away the frameworks and every agent is the same fifteen lines: a model, some tools, and a while-loop. Master this loop and every “agentic” product on earth becomes legible.

THE CENTRAL IDEA

An agent is a model using tools in a loop. That’s the whole definition — the one Anthropic’s engineers and most practitioners have converged on. The model thinks (decides whether it needs a tool), acts (your code executes the tool), and observes (the result is appended to the conversation) — and the loop repeats until the model decides it’s done or you cut the budget. Everything else — planning, reflection, multi-agent teams — is decoration on this loop. If you can write it from memory on a whiteboard, agent interviews become easy.

One cycle of the loop — state lives in messages[], control lives in stop_reason

The loop in ~15 lines

This is a complete, working tool-calling agent. Everything an “agent framework” sells you is an elaboration of these lines:

from anthropic import Anthropic

client = Anthropic()
TOOLS = [{
    "name": "get_weather",
    "description": "Get current weather for a city.",
    "input_schema": {"type": "object",
                     "properties": {"city": {"type": "string"}},
                     "required": ["city"]},
}]

def run_tool(name: str, args: dict) -> str:
    if name == "get_weather":
        return weather_api(args["city"])       # your real implementation
    return f"Unknown tool: {name}"

def agent(user_message: str, max_steps: int = 10) -> str:
    messages = [{"role": "user", "content": user_message}]
    for _ in range(max_steps):                 # a budget, never `while True`
        resp = client.messages.create(
            model="claude-sonnet-4-5", max_tokens=1024,
            tools=TOOLS, messages=messages,
        )
        if resp.stop_reason != "tool_use":     # THINK: model chose to finish
            return resp.content[0].text
        messages.append({"role": "assistant", "content": resp.content})
        results = [{"type": "tool_result", "tool_use_id": b.id,
                    "content": str(run_tool(b.name, b.input))}   # ACT
                   for b in resp.content if b.type == "tool_use"]
        messages.append({"role": "user", "content": results})    # OBSERVE
    return "Step budget exhausted — escalating to a human."

The four moving parts

Every agent system — Claude Code, Cursor, Deep Research, your side project — decomposes into exactly four pieces. Interviewers love asking where a given feature lives; the answer is always one of these:

Part	What it owns	Where it lives
The model	All decisions: which tool, what arguments, when to stop	Provider API — you never see inside
The tools	All side effects: reads, writes, searches, purchases	Your code — schemas sent to the model, execution stays with you
The loop + state	Carrying context forward: `messages[]` is the agent’s entire working memory	Your code — ~15 lines above
The stop conditions	Termination: the model’s `end_turn`, plus your hard budgets	Split — and the hard ones must be yours

KEY MENTAL MODEL · THE MODEL NEVER EXECUTES ANYTHING

The model only ever emits JSON saying “I would like get_weather(city="Bengaluru") to run.” Your code decides whether to honor it, sandbox it, rate-limit it, or ask a human first. This is the security boundary of every agent system — and the reason “the agent deleted my database” is always, technically, “my loop executed a delete without a guard.” Say this sentence in an interview and watch the interviewer relax.

Watch the state grow: one task, four messages

The loop’s only memory is the messages array. Here is its shape after one full cycle — this exact structure, growing turn by turn, is what “context window management” manages:

[
  {"role": "user",      "content": "What's the weather in Bengaluru?"},
  {"role": "assistant", "content": [
      {"type": "text", "text": "I'll check that."},
      {"type": "tool_use", "id": "tu_01", "name": "get_weather",
       "input": {"city": "Bengaluru"}}]},
  {"role": "user",      "content": [
      {"type": "tool_result", "tool_use_id": "tu_01",
       "content": "{\"temp_c\": 28, \"condition\": \"partly cloudy\"}"}]},
  {"role": "assistant", "content": [
      {"type": "text", "text": "It's 28°C and partly cloudy in Bengaluru."}]}
]

Three details worth memorizing: tool results are sent back as a user-role message (the transcript alternates assistant/user even mid-task); every tool_result must reference its tool_use_id (this is how parallel calls stay matched); and the model re-reads the whole array every step — which is why long agent runs get slow and expensive without context management.

The production loop: same skeleton, armored

Production adds four things to the 15 lines — retries for the API, error-passing for tools, parallel execution, and observability. Nothing about the shape changes:

import time, logging
from anthropic import Anthropic, APIStatusError, RateLimitError

log = logging.getLogger("agent")
client = Anthropic(timeout=60.0)

def call_llm_with_retry(messages, tools, retries: int = 3):
    for attempt in range(retries + 1):
        try:
            return client.messages.create(
                model="claude-sonnet-4-5", max_tokens=2048,
                tools=tools, messages=messages)
        except (RateLimitError, APIStatusError) as e:
            if attempt == retries:
                raise
            delay = 2 ** attempt                    # 1s, 2s, 4s
            log.warning("LLM call failed (%s), retry in %ss", e, delay)
            time.sleep(delay)

def safe_run_tool(name, args, registry) -> tuple[str, bool]:
    """Tool errors are DATA for the model, not exceptions for your loop."""
    try:
        if name not in registry:
            return f"Unknown tool '{name}'. Available: {list(registry)}", True
        return str(registry[name](**args)), False
    except Exception as e:                          # noqa: BLE001 — by design
        return f"Tool '{name}' failed: {type(e).__name__}: {e}", True

def agent(user_message: str, registry: dict, tools: list,
          max_steps: int = 15, token_budget: int = 200_000) -> str:
    messages = [{"role": "user", "content": user_message}]
    spent = 0
    for step in range(max_steps):
        resp = call_llm_with_retry(messages, tools)
        spent += resp.usage.input_tokens + resp.usage.output_tokens
        log.info("step=%d stop=%s tokens=%d", step, resp.stop_reason, spent)

        if resp.stop_reason != "tool_use":
            return resp.content[0].text
        if spent > token_budget:
            return "Token budget exceeded — partial progress logged."

        messages.append({"role": "assistant", "content": resp.content})
        results = []
        for block in resp.content:
            if block.type != "tool_use":
                continue
            output, is_err = safe_run_tool(block.name, block.input, registry)
            results.append({"type": "tool_result",
                            "tool_use_id": block.id,
                            "content": output[:8_000],   # cap observation size
                            "is_error": is_err})
        messages.append({"role": "user", "content": results})
    return "Step budget exhausted — escalating to a human."

The non-obvious design choice: a failed tool call feeds the error message back to the model (with is_error: true) instead of crashing the loop. Models are genuinely good at reading “ConnectionError: timeout” and deciding to retry, switch tools, or tell the user — self-healing you get for free. The [:8_000] cap matters too: one verbose tool response can flood the context and starve every later step.

When the loop runs long: context management

A 40-step run accumulates 40 tool results, most of them dead weight after they’ve been used. The three strategies, in escalating order of effort:

Trim: drop or truncate old tool results (keep the model’s own text — its reasoning summarizes what the results said). Cheap and surprisingly effective.
Compact: when context approaches the limit, summarize the transcript into a briefing and restart the loop with it. This is what Claude Code’s auto-compact does; the Claude API now offers server-side compaction and context editing (auto-clearing stale tool results) as platform features.
Externalize: the agent writes findings to files or a memory store and reads them back on demand — context becomes a working set, not an archive. This is the bridge to the memory sections of this guide.

How the loop fails — and the guard for each

Failure	What it looks like	Guard
Infinite loop	Same tool, same args, step after step	`max_steps`; detect repeated (tool, args) hashes and inject “you already tried this”
Tool thrash	Wandering across tools without progress	Step budget + a “progress or plan” nudge; better tool descriptions fix the root cause
Context overflow	Slower, dumber, then a hard API error	Cap observation sizes; compact; externalize
Runaway cost	A $40 answer to a $0.04 question	Token budget in the loop (as above) + per-request cost alerting
Silent tool failure	Model reasons over an empty/garbage result as if real	`is_error` flag + explicit error text the model can react to

THE AUGMENTED LLM

Anthropic’s “Building Effective Agents” names the unit this section just built: the augmented LLM — one model call wired to tools, retrieval, and memory. It is the atom of agentic design. Chains, routers, orchestrators, and full agents (next sections) are all just arrangements of this atom. Get the atom right — clean tool definitions, honest budgets, error passing — before composing it into anything grander.

INTERVIEW TIP

“Build an agent” questions are almost never about frameworks — they are checking whether you know that an agent is a loop you control. Write the 15-line version first, narrate think–act–observe as you go, then armor it out loud: “budgets here, retries here, tool errors go back as data, observations get capped.” Framework name-drops are optional; the loop is not.

GO DEEPER

Anthropic, “Building Effective Agents” (2024) — the augmented-LLM framing and the case for simplicity · Claude docs: Tool use — the exact request/response contract used here · Simon Willison, “Agents are models using tools in a loop” — the definition that stuck · Next up: The Agentic Spectrum — what to build when one loop isn’t enough.

The Agent Loop

The Agent Loop: Think → Act → Observe

The loop in ~15 lines

The four moving parts

Watch the state grow: one task, four messages

The production loop: same skeleton, armored

When the loop runs long: context management

How the loop fails — and the guard for each

More in LLM & Agentic

LLM Lifecycle

How LLMs Call Tools

When to Fine-Tune: The Decision Framework

LoRA, QLoRA & PEFT Methods