Drip · Agents & RAG · 16 min read

Agentic Context Engineering

“Prompt engineering” describes a single turn. Production agents run hundreds. The skill that determines whether they hold their behavior over a long loop is context engineering — and it has four named operations.

The bottom line. Lance Martin’s June 2025 Context Engineering for Agents(LangChain) named four operations, and Anthropic’s September 2025 Effective context engineering for AI agents independently described the same production techniques. As Phil Schmid put it, most agent failures are context failures, not model failures. Four operations cover most fixes — Write (persist memory outside the window), Select (retrieve only what this turn needs), Compress (summarize and prune as history grows), Isolate (give each agent only its slice via sub-agents). The lab below runs the same 30-turn agent under each strategy. Raw context loses the early rules after enough turns; Write stabilizes adherence; Isolate flatlines cost.

If you haven’t read it yet: Context Engineering is the sibling drip on the static side of this — what to put into a single window, the lost-in-the-middle problem, the token budget. This piece picks up where that one stops: what happens to that context across a long agent loop.

§ 00 · WHY THE REFRAMEPrompt engineering describes a turn. Agents run hundreds.

Prompt engineering as a discipline was shaped by the chatbot era: one user message, one model response, optimize the prompt that sits between them. It was the right frame for the work — until agents arrived.

A production agent in 2026 isn’t shaped like a chat completion. It runs in a loop. The model decides on an action, a tool returns a result, the model reads the result, decides on the next action. The loop runs for dozens or hundreds of turns until the goal is met or the orchestrator cuts it off. The “prompt” isn’t a string anymore — it’s a sliding window over an ever-growing conversation history, plus whatever the tools dump in, plus whatever the model itself produced last turn.

Most agent bugs in production look like model bugs. The agent forgets a constraint it was told at turn 1. It contradicts a rule it just acknowledged. It loops because it can’t see the tool error from three turns ago. Engineers reach for a better model. The bug doesn’t go away. The bug isn’t in the model — it’s in the context the model sees at the moment of failure.

Lance Martin’s June 2025 Context Engineering for Agents crystallized what production teams had been doing for months under different names: a single four-operation framework that covers most of the fixes. Anthropic’s September 2025 Effective context engineering for AI agents independently arrived at the same four production techniques.

§ 01 · WRITE — PERSIST OUTSIDE THE WINDOWYour prompt is not memory. Stop using it like one.

The first operation is to recognize that the context window isn’t storage. It’s a hot working buffer. Long-term constraints, accumulated facts, identity rules, anything the agent needs to remember across the run — all of it belongs outside the window, in a structured place the agent can read and write deliberately.

The canonical shape is a small markdown file of stable rules. Anthropic’s Claude Code calls it CLAUDE.md — a user-authored file of persistent instructions — and separately maintains its own MEMORY.mdthat it updates from your corrections. Cursor calls it project rules. OpenAI’s agent.md is the same idea. In 2025, OpenAI, Google, Cursor, Sourcegraph and others converged on a shared convention — AGENTS.md — now read by many major agent tools. It complements rather than replaces tool-specific files like CLAUDE.md, which Claude Code still reads natively. The convergence is the tell — production teams independently arrived at the same shape because the same shape worked.

The discipline is to reinject this file every single turn, regardless of how long the conversation has run. The model sees it again at turn 1, turn 15, turn 50. The rules don’t decay because they’re structurally renewed. Five hundred tokens of stable, well-written rules beats 200,000 tokens of raw history every time — and the lab below shows you exactly why.

§ 02 · SELECT — RETRIEVE, DON’T DUMPFacts live in a database. The prompt carries only what this turn needs.

The second operation comes from RAG and looks like RAG, but it’s narrower: every turn, pull the specific facts this turn requires. Nothing more. The retrieval target isn’t a vector store full of marketing copy — it’s the user’s own past state, the system’s ground truth, the documents relevant to the question the agent is currently asking.

Two tables, drawn carefully, cover most cases. A state table for current facts — the user’s preferences, the agent’s current task, the running goal. An events table for the audit trail — every tool call, every mutation, every decision the agent made. The state table answers “what is true right now”; the events table answers “how did we get here.” The agent reads from both, but only the slice it needs.

The cost dynamics are striking. A raw-context agent at turn 30 is carrying 30 × 320 ≈ 10,000 tokens of history, most of which is irrelevant to the current decision. A select-shaped agent at turn 30 is carrying maybe 800 tokens of relevant retrieval plus the 500-token memory file. Same model, 1/8 the input cost — and better recall, because the model isn’t triaging which of the 30 turns matters.

§ 03 · COMPRESS — PRUNE AS YOU GOOld turns are summaries waiting to happen.

Some conversation history does matter — the user’s tone, the running narrative of what the agent has tried, the acknowledgements that bind a multi-step plan together. You can’t throw it all away. But you also can’t carry every word of it forward indefinitely.

The compress operation runs in the background of every long conversation: once a turn falls out of the active window (typically 6–10 turns back), summarize it into a single short paragraph and replace the raw exchange with the summary in the running context. The agent now has a compact narrative of the whole run that fits in 2–3K tokens regardless of conversation length, plus the verbatim recent turns for the immediate next-step decision.

The execution detail that matters: the summarizer model doesn’t need to be the same model as the agent. A small, cheap summarizer (Haiku, GPT-4o-mini) running async between turns is enough — and lets you reserve frontier tokens for the actual reasoning.

§ 04 · ISOLATE — SUB-AGENTS FOR SLICESThe orchestrator carries the goal, not the work.

The fourth operation acknowledges what the first three can’t solve: some tasks have irreducibly large context requirements. The agent needs to read three different documents, cross-reference an API schema, run a query against a database, and reconcile the results. No amount of clever compression makes all of that fit cleanly in a single window.

The fix is to stop trying. Spawn a sub-agent per domain. The document-reading sub-agent only sees the documents. The query-running sub-agent only sees the schema and the database. Each runs to completion in its own clean context and returns a structured result. The orchestrator agent — which carries only the goal — composes those results into the final answer.

Cost-wise this is a force multiplier. Each sub-agent context is small and fast and finite. The orchestrator carries roughly the same 2K tokens at turn 1 as at turn 100, because the “work” lives in disposable child contexts that get torn down when their slice is done. The lab below makes this visible — the Isolate strategy is the only one whose token-per-turn graph flatlines.

With all four operations on the table, the system is easier to see as one picture: each operation is governed by a single question per turn — what enters the window, from where, and what gets evicted.

CONTEXT WINDOW(this turn)MEMORY.mdWRITEpersist outside, reinject every turnnote-takingfact storeSELECTpull only this turn’s slicejust-in-time retrievalCOMPRESSsummarize tail past 6–10 turnscompactionsub-agentsub-agentsub-agentISOLATEsub-agents return structured resultssub-agent architectureOperations describe where info lives and when it enters the window — the datastore is interchangeable.
Fig 1The four operations as a single data flow. Write persists rules outside the window and reinjects them; Select pulls only this turn's slice from a fact store; Compress summarizes the history tail; Isolate runs sub-agents that return structured results. Technique names follow Anthropic's September 2025 'Effective context engineering.'

All four are implementation-agnostic: they describe where information lives and when it enters the window, not which datastore you reach for.

§ 05 · CONTEXT DRIFT, DEMONSTRATEDSame prompt, four strategies, 30 turns

The lab below simulates one canonical scenario: an agent given a clear set of rules at turn 1 (“respond in JSON”, “max 100 words”, “cite at least one source”) runs for 30 turns under each of the four strategies. The synthetic adherence score reflects what teams measure in production — rule-following degrades non-linearly as the early context gets diluted, then collapses. Stack the operations and the collapse moves further out, or vanishes.

Lab · context drift over 30 turnsSame agent loop, four context strategies — watch rule adherence fall as raw context inflates

Every turn carries the entire history forward. Tokens explode, the model loses early rules.

Tokens per turnfinal: 9,960
turn 1turn 15turn 30
Rule adherence (synthetic 0–100%)final: 32%
0%60% (failing)100%
Avg adherence
83%
Total tokens
160K
Cost / run
$0.48

Raw context degrades around the midpoint of a long run because the model loses recall of early rules — the context-rot phenomenon documented by Chroma and others in 2025. Layering Write, Select, Compress, and Isolate addresses different failure modes; Isolate is the only one that bounds cost regardless of run length.

Two takeaways. First: the strategies aren’t alternatives, they’re a stack. Writealone helps but doesn’t flatten cost. Selectalone flattens cost but doesn’t preserve identity rules. The mature production agent uses all four, layered. Second: notice when each one earns its keep. Write for any agent with persistent rules. Select for any agent with fact-based questions. Compress for long conversations. Isolatewhen a single task has slices that don’t need to share context.

§ 06 · HALLUCINATION BY OMISSIONThe failure mode RLHF baked in

There’s a fifth failure mode worth naming because it explains a class of agent bugs that look mysterious. Consumer agents — the chat-shaped products — are RLHF-trained to be helpful. When a tool returns an error, an unhelpful answer is “the tool failed.” A helpfulanswer is the result the user was hoping for. The model’s RLHF objective rewards the latter. Production agents that inherit that training objective will, by default, paper over tool failures by making up the result the user wanted to see. This is a documented consequence of helpfulness-oriented post-training — see the sycophancy literature and work on tool hallucination.

The fix is mechanical. Every tool returns a shape that explicitly distinguishes success from failure:

type ToolResult<T> =
  | { ok: true; data: T }
  | { ok: false; error: string };

And the system prompt makes the contract explicit: “if a tool returns ok: false, report the failure to the user. Do not invent a result. Do not paraphrase the error away.”

This pattern doesn’t solve hallucination in general — that problem is much harder. But it solves the specific case of hallucination-as-helpfulness in tool-using agents, which is the version that shows up most in production. The agent fails loud instead of soft. Errors propagate to the orchestrator and to the user. Bugs get fixed instead of laundered.

Step the run below with one tool call rigged to fail, and toggle between an RLHF-helpful agent and one that honors the ok: false contract. Watch a single soft failure quietly corrupt the final answer in helpful mode.

Lab · the ToolResult contractStep a 5-call run with one failing tool — watch a soft failure silently corrupt the answer
Agent mode
Inject failure at call #
type ToolResult<T> = { ok: true; data: T } | { ok: false; error: string }
1getUserPlan()
2fetchInvoice()
3computeRefund()
4applyCredit()
5sendEmail()

One silently-swallowed tool error is enough to make a confident final answer wrong. The ok: false contract is what converts a soft, invisible failure into a loud, fixable one — flip to Fail-loud and the same failing call halts the run instead of laundering it. Illustrative scenario, not a benchmark.

One silently-swallowed tool error is enough to make a confident answer wrong; the contract is what converts an invisible soft failure into a loud, fixable one.

CHECKYour agent runs a 40-turn conversation. At turn 35 it produces an answer that contradicts a constraint you set at turn 1. Which operation would address this most directly?

§ 07 · WHAT THIS REPLACESThe reframe is real, the prompt is still doing work

It would be tidy to say prompt engineering is dead. It isn’t. The system prompt still matters — every operation above feeds intoa prompt the model sees. What changed is that the prompt is no longer the unit of design. The prompt is the output of a context-engineering process; it’s composed at runtime from the memory file plus the retrieved facts plus the compressed history plus the user’s current turn. The skill is upstream of the string.

Make that literal: the composer below builds a single turn’s window from those four sources. Where the §05 lab traced behavior over 30 turns, this dissects the anatomy of one.

Lab · anatomy of one turnCompose a single turn’s context window from four sources — watch recall fall as you stuff it
500 tok

Stable rules reinjected every turn (CLAUDE.md / MEMORY.md).

800 tok

Just-in-time retrieval — only this turn’s slice of state + events.

1,200 tok

Summarized tail past 6–10 turns — fits a long run in a few thousand tokens.

180 tok

The actual request this turn — always present.

Model window:
Composed window2,680 tokens
Memory file · WriteRetrieved facts · SelectCompressed history · CompressCurrent user turn
Total tokens
2,680
Window used
2.1%
of 128K
Est. recall
100%
crisp
Cost / turn
$0.0080
$3/M input

The window the model sees is composed, not given. Stuffing it with raw history instead of Write + Select + Compress is what pushes the answer into the lost-in-the-middle zone. The recall curve here is a simplified model of Liu et al. 2024 — illustrative, not a measured benchmark.

The window the model sees is composed, not given — and stuffing it with raw history instead of Write + Select + Compress is what pushes the answer into the lost-in-the-middle zone.

This is why teams in 2026 measure context-engineering decisions with evals (covered in a sibling drip) and harness the resulting agent with retries and circuit breakers (another sibling drip). The four operations don’t replace any of the rest of the production stack — they’re the missing layer between the prompt-engineering era and the agent-as-a-system era. Write, Select, Compress, Isolate. The skill of 2026.

§ · FURTHER READINGReferences & deeper sources

  1. Lance Martin (2025). Context Engineering for Agents · rlancemartin.github.io · LangChain (June 23, 2025)
  2. Anthropic (2025). Effective context engineering for AI agents · Anthropic Engineering (Sept 29, 2025)
  3. Phil Schmid (2025). The New Skill in AI is Not Prompting, It's Context Engineering · philschmid.de (June 2025)
  4. Chroma Research (2025). Context Rot: How Increasing Input Tokens Impacts LLM Performance · research.trychroma.com
  5. Zhang, Lin, et al. (2025). Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models · arXiv:2510.04618 (Stanford / SambaNova / UC Berkeley)
  6. Liu, Lin, Hewitt, et al. (2024). Lost in the Middle: How Language Models Use Long Contexts · TACL vol. 12, pp. 157–173 · arXiv:2307.03172
  7. Lewis et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks · NeurIPS 2020 · arXiv:2005.11401
  8. Anthropic (2025). Manage Claude's memory (CLAUDE.md) in Claude Code · Anthropic Documentation
  9. OpenAI, Google, Sourcegraph, Cursor, Factory (2025). AGENTS.md — open format for guiding coding agents · agents.md

Original figures live in the linked sources — open the papers for the canonical visuals in their full context.