ReAct Pattern

ReAct (Reasoning + Acting) pattern for AI agents: how to combine chain-of-thought reasoning with tool use for better agent performance.

Last updated 2026-06-12

ReAct: Reasoning + Acting in One Loop

The 2022 paper that taught models to think between tool calls — and quietly became the blueprint for every tool-calling API you use today. If you only know one agent pattern, know this one.

THE CENTRAL IDEA

Before ReAct (Yao et al., ICLR 2023), the field had reasoning or acting: chain-of-thought models that reasoned beautifully from stale memorized facts, and action models that clicked buttons without ever planning. ReAct’s move was to interleave them in one transcript: Thought → Action → Observation, repeat. Each thought plans the next action; each observation grounds the next thought in reality. Reasoning steers acting, acting feeds reasoning — and hallucination drops because the model reasons over fresh evidence it just retrieved instead of whatever pre-training left behind.

Three ways to answer a question — and what each one is missing

Anatomy of a trace — the paper’s own famous example

ReAct is a prompting format before it is anything else. The model emits labeled steps into one growing transcript; your runtime executes each Action and appends the Observation:

Question:  Aside from the Apple Remote, what other device can control
           the program Apple Remote was originally designed to interact with?

Thought 1: I need to find the program the Apple Remote was originally
           designed to interact with.
Action 1:  search[Apple Remote]
Observation 1: The Apple Remote is a remote control introduced in 2005
           designed to control the Front Row media center program...

Thought 2: So the program is Front Row. Now: what other devices control
           Front Row?
Action 2:  search[Front Row software]
Observation 2: Front Row is a discontinued media center... controlled by
           an Apple Remote or the keyboard function keys.

Thought 3: The other device is the keyboard function keys.
Action 3:  finish[keyboard function keys]

Notice what the interleaving buys: Thought 2 could not have been written in advance — it depends on Observation 1. That is the whole trick. A chain-of-thought model would have answered from its parametric memory (and in the paper’s experiments, frequently hallucinated a device); ReAct looked, then reasoned about what it found.

What the paper actually showed

HotpotQA & FEVER (knowledge tasks): ReAct beats act-only prompting and dramatically cuts the hallucination rate vs chain-of-thought — the trace is grounded in retrieved Wikipedia passages, and when it’s wrong, the error is visible in the trace as a bad search rather than buried in fluent confabulation. Best results came from combining ReAct with CoT fallback.
ALFWorld (interactive environments): +34 percentage points absolute over imitation-learning baselines — the thoughts let the model decompose goals (“find a lamp” → “check the desk first”) and recover when a location surprised it.
WebShop (web navigation): +10 points absolute over prior methods, with one to two orders of magnitude less training data than RL approaches — prompting, not gradient descent.
The interpretability dividend: a human can read the trace and see where reasoning went wrong. This audit-trail property is why regulated industries still like explicit ReAct formats.

Implementation: the text-parsing loop

Classic ReAct predates tool-use APIs, so the runtime parses actions out of raw text. Two details do all the safety work — the stop sequence and the strict parser:

import re

REACT_SYSTEM = """Answer by interleaving Thought / Action / Observation steps.
Available actions:
  search[query]  - web search, returns top snippets
  lookup[term]   - find a term in the last result
  finish[answer] - return the final answer
Emit exactly one Action per turn. Never write an Observation yourself."""

ACTION_RE = re.compile(r"Action(?:\s*\d*)?:\s*(\w+)\[(.*?)\]", re.S)

def react(question: str, llm, tools: dict, max_steps: int = 8) -> str:
    transcript = f"Question: {question}\n"
    for _ in range(max_steps):
        out = llm(system=REACT_SYSTEM, prompt=transcript,
                  stop=["Observation"])          # <- the crucial trick
        transcript += out
        m = ACTION_RE.search(out)
        if not m:
            return out.strip()                   # answered inline
        name, arg = m.group(1), m.group(2).strip()
        if name == "finish":
            return arg
        handler = tools.get(name)
        obs = handler(arg) if handler else f"Unknown action '{name}'"
        transcript += f"\nObservation: {str(obs)[:2000]}\n"
    return "Step limit reached — best effort: " + transcript[-500:]

THE STOP-SEQUENCE TRICK

Without stop=["Observation"], the model will happily hallucinate its own observations — writing “Observation: The API returned success” for an action that never ran, then reasoning over fiction. Cutting generation at the word Observation forces the runtime to be the only source of ground truth. This single line is the difference between a grounded agent and a very confident novelist.

ReAct in 2026: the pattern became the platform

You will rarely hand-roll the parser today — not because ReAct lost, but because it won so completely that it moved into the API layer. The mapping is exact:

ReAct (2022 paper)	Modern tool-use stack
Thought: free-text reasoning	Extended thinking / interleaved thinking blocks — reasoning between tool calls, now a model feature
Action: `search[query]` parsed by regex	`tool_use` block — schema-validated JSON, no parser to break
Observation: appended text	`tool_result` message keyed by `tool_use_id`
The growing transcript	`messages[]` — the agent loop’s state

Explicit ReAct-style prompting still earns its keep in three places: models without native tool use (small open-weights, edge deployments), audit-first domains where a plain-text reasoning trace is a compliance artifact, and forcing deliberation — requiring a written Thought before expensive or dangerous tools measurably reduces impulsive calls by weaker models.

Failure modes and their fixes

Failure	Trace smell	Fix
Hallucinated observation	“Observation:” text the runtime never wrote	Stop sequences (above); native tool calling makes it structurally impossible
Search loop	Same query, slightly reworded, five times	Hash (action, args); on repeat, inject “already tried — change strategy or answer”
Malformed action	Regex misses; loop treats plan text as final answer	Strict parser + one corrective reprompt; or migrate to schema-validated tool use
Observation flooding	One 30KB result; later thoughts degrade	Truncate/summarize observations before appending (the `[:2000]` above)
Derailment	Thoughts drift to an adjacent, unasked question	Re-inject the original question every N steps; keep thoughts one sentence
Overthinking trivia	Four searches for a fact the model knows cold	Route easy queries to direct answering; ReAct only above a complexity threshold

INTERVIEW TIP

Two ReAct answers separate seniors from juniors. One: why it works — “each thought is conditioned on fresh evidence, so reasoning stays grounded and the plan can change mid-task.” Two: where it lives now — “function calling is ReAct industrialized; I’d use native tool use in production and reserve hand-rolled ReAct for models without it or for audit-trail requirements.” Bonus points for naming the stop-sequence trick — it proves you’ve actually run one.

GO DEEPER

Yao et al., “ReAct: Synergizing Reasoning and Acting in Language Models” (ICLR 2023) — the paper; the ALFWorld traces are worth reading whole · Lilian Weng, “LLM Powered Autonomous Agents” (2023) — ReAct in the wider agent taxonomy · This guide’s Reflexion section — what happens when a ReAct episode fails and you want attempt #2 to be smarter.

ReAct Pattern

ReAct: Reasoning + Acting in One Loop

Anatomy of a trace — the paper’s own famous example

What the paper actually showed

Implementation: the text-parsing loop

ReAct in 2026: the pattern became the platform

Failure modes and their fixes

More in LLM & Agentic

LLM Lifecycle

How LLMs Call Tools

When to Fine-Tune: The Decision Framework

LoRA, QLoRA & PEFT Methods