Harness Engineering — Configuring the Shell Around the Model

The Claude Code harness is a stack of files, layers, and scopes. Reading the stack top-to-bottom pays back in the first week.

The model is only half of a Claude Code session. The other half is the harness: the CLI, the settings files, the permission system, the hooks, the MCP servers, the subagents, the plugins. The harness decides what tools exist, what runs before and after each tool call, what needs your approval, and what happens when a subagent finishes.

Most people configure the harness ad-hoc as friction shows up. That works for a while and then starts to hurt in three predictable ways: prompts you approve without reading (because there are too many), broken hooks in one repo but not another (because settings drift), and secrets you can't remember where they live (because MCP servers accumulate). Harness engineering is the practice of thinking about the shell around the model on purpose.

The layer stack

Everything about the harness lives in three files, each at a different scope:

Settings precedence — later layers override earlier ones

The layers merge additively for most keys (permissions, hooks, env), so a project-level rule doesn't erase a user-level rule — it stacks on top of it. Local layer is the escape hatch: personal experiments that shouldn't hit git go here.

What goes where

User layer — your defaults across every project

Hooks you always want: block .env edits, notify on session end, cost tracker.
Environment variables you always want set (ANTHROPIC_API_KEY, editor preferences).
Permissions for read-only commands you trust anywhere: ls, git status, tree.
MCP servers that make sense everywhere: context7 for docs, memory for cross-session notes.

Project layer — team-shared, this repo

Hooks that make sense only for this stack: ESLint on TS files, pytest on Python files.
Permissions for repo-specific commands: npm run test, bq query, terraform plan.
MCP servers wired to this repo's infrastructure: Supabase pointing at the project ref, Linear pointing at the workspace.
Subagents that encode team standards: payments-security-reviewer, content-consistency-reviewer.

Local layer — you, this repo, not committed

Personal secrets in env vars (until you've moved them somewhere better).
Experimental hooks you don't want to inflict on the team yet.
Permissions you approved once during exploration and want to remember.

The permission wildcard problem

Every time you approve a tool call in Claude Code, the CLI offers to remember it — usually with the option to remember it by exact command, by prefix, or by wildcard. Wildcards are the tempting choice because they end all future prompts of that type. They are also the choice that most often ends in a bad afternoon.

ANTI-PATTERN: Bash(*) in your allow list. It means: any shell command Claude generates runs without prompting. Any. Including rm -rf, git push --force, psql statements you didn't read. Every wildcard in your permission list is a category of mistake you've opted out of catching.

The right unit for a permission is a prefix that captures a family of safe operations. Bash(git status:*), Bash(npm run test:*), Bash(curl -s http://localhost:*). Each is specific enough that reading the actual command still catches surprises.

If you find your settings.local.json has grown to 100+ lines of scoped permissions, that's a sign the harness is working — you've captured your actual usage without giving up the ability to notice something new. Wildcard would have been shorter and much more dangerous.

Hook vs subagent vs skill vs MCP

Once you understand the layers, the next question is where to put a new capability. Four choices, one clean division:

WHAT YOU WANT	REACH FOR	WHY
Deterministic reaction to a tool event	Hook	Runs synchronously, sub-second, no model call
A second opinion in a fresh context	Subagent	Isolated context window, own tool list, own prompt
Packaged workflow / knowledge / template	Skill	Model-invocable or user-invocable; can bundle scripts
Access to an external system	MCP server	New tools the model can call; not code you own

The failure mode when you pick the wrong one: hooks that try to reason (they can't — they're bash), subagents that try to react to events (they can't — they're only spawned on request), skills that try to enforce (they can't — the model chooses whether to invoke), MCP servers that try to encode workflow (they don't — they only expose tools).

A starter project layer

A good default project settings.json has four sections, in this order of importance:

Hooks — a PreToolUse block on .env edits, a PostToolUse formatter on your primary language.
Permissions — scoped allows for your test runner, build command, and any repo-specific curl endpoints. Never Bash(*).
Subagents — one for security-sensitive reviews (auth, payments, RLS), one for content or code-style consistency.
MCP servers — one for docs (context7), one for your database if applicable, one for your issue tracker if applicable.

Those four sections are enough to remove most of the recurring friction in a real project. Everything else — plugins, custom output styles, statusline scripts — is polish you add when the primitives are in place.

When to reach for a plugin

Plugins are the packaging layer above skills. If a set of skills, subagents, or hooks is stable enough to reuse across three or more projects, wrapping them as a plugin is a nice way to keep them versioned and updatable. Before you have that reuse, plugins are premature — they add a distribution step for one consumer.

HEURISTIC: Write the skill or subagent inline in the project first. Move it to your user layer when a second project needs it. Wrap it as a plugin only when a third project (or a teammate) does.

The maintenance cost

Every layer you add to the harness is a layer you need to remember exists. Six months in, a mysterious hook firing in one repo but not another can eat an afternoon. Two mitigations help. First, prefer the highest layer that makes sense — user-layer hooks are one file to reason about, not N. Second, document non-obvious pieces at the top of each settings file — comment lines aren't legal JSON, but a companion settings.README.md is.

Worked examples

Two people, very different lives, same three-layer stack solving the same problem in different shapes.

Example 1 — Freelance dev with five clients (engineering)

A solo consultant works on five client codebases over the course of a week: a React SaaS for Client A, a Django API for Client B, a Go microservice for Client C, an Elixir monolith for Client D, and a legacy PHP admin for Client E. Each has different lint rules, different deploy targets, different secrets stores, different code review conventions. Without harness discipline, every context switch is a 15-minute rediscovery: which formatter, which test runner, which deploy command, which linter overrides.

With the layer stack: the USER layer holds what's true everywhere (block .env, cost warning, prefer verbose git commits). Each PROJECT layer holds what's true for that client (formatter, lint config, test runner, deploy MCP). Nothing about Client A leaks into Client B, because the project layer only loads when Claude Code is opened inside that repo.

Five client repos — shared user layer, distinct project layers

The kicker: each project layer file is ~40 lines. Twelve minutes to write, forever thereafter to save. Compare to the alternative of ‘remember what stack you're in and configure it in your head every morning,’ which never converges.

Example 2 — Novelist writing a series (non-technical)

A novelist is writing a five-book fantasy series over several years. Consistency is the killer problem: character voice, world lore, place names, timeline. Every book Claude drafts a chapter of, the model needs to know: what does this character sound like? What year is it? Who's still alive? What are we not supposed to reveal until Book 4?

The stack, mapped to the novelist:

USER layer — writing style defaults (present tense, cut adverbs, sentence variety), a hook that runs a passive-voice scanner on every save, a cost tracker so a runaway agent doesn't burn through the month's API budget.
PROJECT layer, per book — a series-bible skill loaded automatically (‘read bible/ before writing’), a character-voice skill per protagonist, an MCP server pointing at a local timeline database, permissions scoped to the book's directory.
LOCAL layer — spoilers directory permissions (Book 5 outline sits in drafts/; only unlock when actually writing Book 5).

Novelist's harness — universal voice above, book-scoped world below

Two years into the series, the novelist can drop into any book directory and Claude ‘knows’ the world without a re-briefing prompt. Book 5's outline stays firewalled by a local-layer deny rule — even if the model gets curious, the permission scope stops it from reading. The harness is doing consistency work that no amount of prompting could do reliably.

KEY INSIGHT: The harness is invisible until it's broken. Then it's the only thing that matters. Spend an hour treating it as a real system — layers, scopes, precedence, ownership — and you get months of not thinking about it.