What are the four context engineering strategies?

Write (save context outside the window), Select (pull relevant context in), Compress (retain only necessary tokens), and Isolate (split context across sub-agents). LangChain published a repo with notebook code for all four patterns.

Why does context window size matter for agent performance?

Mem0's BEAM benchmark shows accuracy drops from 64.1 at 1M tokens to 48.6 at 10M tokens. That's a 25% performance loss when context scales 10x. Bigger context means more noise, more confusion, and more cost.

Should I put my agent's memory in the context window?

No. The context window is a workspace for current reasoning. Long-term memory should live in a structured database outside the context. The agent reads from it on boot and writes to it on tool execution.

What is the best memory architecture for agents?

Three memory types: episodic (what happened), semantic (facts), and procedural (how things should be done). The Continuum Memory Architecture paper shows that RAG alone can't accumulate, mutate, or disambiguate memory over time.

Your context window is a workspace, not a filing cabinet

Q: What is context engineering?

Context engineering is the practice of deciding what information enters an AI model's context window, when it enters, and how it's formatted. Andrej Karpathy calls it 'the delicate art and science of filling the context window with the right information for the next step.' It's distinct from prompt engineering.

Context engineering decides what your agent remembers, what it forgets, and what it never sees. Four strategies with code. Andrej Karpathy calls it the #1 job for agent builders.

TL;DR: Context engineering is deciding what your agent remembers, forgets, and never sees. Four strategies: write, select, compress, isolate. Mem0’s BEAM benchmark shows a 25% accuracy drop when context scales from 1M to 10M tokens. LangChain published a GitHub repo with notebook code for all four patterns. Memory belongs in Postgres, not the prompt.

Key takeaways:

Context engineering is the #1 job of engineers building AI agents. It’s not prompt engineering.

Four strategies: write (save outside), select (pull in), compress (trim), isolate (split across agents).

LangChain’s GitHub repo (langchain-ai/context_engineering) has runnable code for all four patterns.

BEAM benchmark: 64.1 accuracy at 1M tokens drops to 48.6 at 10M (25% loss).

Three memory types: episodic, semantic, procedural. RAG alone doesn’t handle temporal continuity.

The first time I watched an agent fill its context window with conversation history, I thought it was being thorough. It was being stupid. The model spent 80% of its working memory recalling what it said five turns ago instead of reasoning about the current problem.

Your context window is a workspace, not a filing cabinet. It’s where your agent thinks, not where it stores things. Every byte of memory you shove in there is a byte of working memory you took away.

Andrej Karpathy calls context engineering “the delicate art and science of filling the context window with just the right information for the next step” (LangChain: Context Engineering for Agents). Cognition says it’s “effectively the #1 job of engineers building AI agents.” Anthropic laid it out: “Agents often engage in conversations spanning hundreds of turns, requiring careful context management strategies.”

LangChain published a GitHub repo in 2025 with runnable notebook code for all four strategies (langchain-ai/context_engineering, 183 stars). Each strategy maps to a specific code pattern.

What does a kitchen have to do with context windows?

A kitchen has a cutting board and a pantry. The cutting board is where you work. Small, active, everything on it relevant to the dish you’re making right now. The pantry is where you store ingredients. Large, organized, pull from it when you need something.

The context window is the cutting board. The database is the pantry.

Most agents treat the context window like a walk-in freezer. They dump everything in there. conversation history, tool results, past decisions, reference documents. The model sorts through a pile of frozen ingredients to find the one it needs.

Strategy 1: Write. Save context outside the window.

Don’t keep everything in the context. Save plans, decisions, and intermediate results to external storage.

Anthropic’s multi-agent researcher saves plans to Memory because “if the context window exceeds 200,000 tokens it will be truncated” (LangChain blog). The plan doesn’t disappear. It moves to the pantry.

In LangGraph, this looks like writing to a scratchpad tool:

# Write: persist plans and intermediate results to external storage
from langgraph.checkpoint import MemorySaver

# Define a scratchpad tool for the agent to write to
scratchpad = {"plan": "", "findings": []}

def write_plan(state):
    # Agent calls this tool to save its plan outside the context window
    scratchpad["plan"] = state["agent_output"]
    return {"saved": True}

def read_plan(state):
    # Restore plan from storage on subsequent turns
    return {"plan": scratchpad["plan"]}

In practice: when your agent completes a step, write the result to a file or database. Don’t carry it forward in the context. The next step reads what it needs from storage, not from the conversation history.

Strategy 2: Select. Pull relevant context in.

Don’t dump everything in. Pull only what’s relevant. RAG over tool descriptions improves tool selection accuracy by 3x. That’s not a typo.

# Select: retrieve only relevant context from a vector store
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Store tool descriptions in a vector database
vectorstore = Chroma(
    collection_name="tool_descriptions",
    embedding_function=OpenAIEmbeddings()
)

def select_tools(task_description: str, k: int = 5):
    # Only retrieve tools relevant to this specific task
    results = vectorstore.similarity_search(task_description, k=k)
    return [r.metadata["tool_name"] for r in results]

The kitchen equivalent: don’t put every ingredient on the cutting board. Pull the salt, the pepper, and the specific spice this dish needs. Leave the rest in the pantry.

Strategy 3: Compress. Retain only necessary tokens.

Claude Code runs auto-compact after 95% of the context window fills up. It summarizes older conversation turns and keeps recent ones raw. Cognition uses a fine-tuned model for this.

# Compress: summarize older turns, keep recent ones raw
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

compress_prompt = ChatPromptTemplate.from_messages([
    ("system", "Summarize the following conversation history. Keep all decisions, results, and unfinished tasks. Discard discussion patterns and dead ends."),
    ("human", "{history}")
])

def compress_history(messages, keep_recent: int = 5):
    recent = messages[-keep_recent:]
    older = messages[:-keep_recent]
    if older:
        summary = llm.invoke(compress_prompt.format(history=older))
        return [summary] + recent
    return recent

The kitchen equivalent: after you’ve prepped the vegetables, you don’t need the raw vegetables on the board. Move them to a prep bowl. Keep the board clear for cooking.

Strategy 4: Isolate. Split context across sub-agents.

Anthropic found that “many agents with isolated contexts outperformed single-agent, largely because each sub-agent context window can focus to a more narrow sub-task.”

# Isolate: give each sub-agent its own clean context window
from langgraph.graph import StateGraph

# Three agents, three separate state objects
research_agent = StateGraph(state_schema=ResearchState)
writing_agent = StateGraph(state_schema=WritingState)
validation_agent = StateGraph(state_schema=ValidationState)

# Each agent gets a clean context focused on its job
research_agent.add_node("research", research_worker)
writing_agent.add_node("write", write_worker)
validation_agent.add_node("validate", validate_worker)

# Only pass the summary between agents, not the full history
thread = research_agent.compile()
summary = thread.invoke({"query": user_query})

The kitchen equivalent: the prep cook doesn’t need to know about the dessert station. Give each station its own cutting board.

Why do context windows degrade at scale?

Mem0’s State of AI Agent Memory 2026 report includes the BEAM benchmark, designed to show why 1M context windows aren’t enough (Mem0: State of AI Agent Memory 2026):

BEAM 1M: 64.1 accuracy
BEAM 10M: 48.6 accuracy

That’s a 25% performance drop when context scales 10x. Bigger context doesn’t mean better performance. It means more noise.

The token cost difference is stark. Mem0’s memory system uses 6,956 tokens per retrieval call. Full-context approaches use approximately 26,000 tokens. That’s 73% fewer tokens for the same information, with better accuracy because the retrieved context is relevant instead of comprehensive.

Multi-agent systems amplify this. Anthropic reports sub-agents use up to 15x more tokens than single-turn chat. If you’re not managing context aggressively, your multi-agent system burns tokens on overhead.

What memory architecture does your agent need?

A Reddit post with 51 upvotes put it bluntly: “Stop putting your AI agent’s memory inside the LLM context window.” Durable state and memory must live outside the agent in a structured transactional database like Postgres. The agent reads from it on boot and writes to it on tool execution.

Three memory types, which LangChain’s framework also identifies:

Episodic memory. What happened. Past interactions, decisions, outcomes. “Last Tuesday, the user asked about pricing and I quoted ₹12,000/month.”

Semantic memory. Known facts. Facts, rules, preferences. “The user prefers INR-first pricing. The user’s project uses Astro 6.”

Procedural memory. How to do things. Workflows, patterns, lessons. “When the user says ‘fix the post,’ check the attribution footer format first.”

The Continuum Memory Architecture paper from arXiv (January 2026) formalizes this (Continuum Memory Architectures). The core argument: “RAG treats memory as a stateless lookup table. Information persists indefinitely, retrieval is read-only, and temporal continuity is absent.” Real memory needs to accumulate, mutate, and disambiguate over time. CMA showed consistent advantages on knowledge updates, temporal association, associative recall, and contextual disambiguation.

What are the four ways context goes wrong?

Drew Breunig identifies four ways context goes wrong, all a function of stuffing too much into the window:

Context poisoning. A hallucination enters context and propagates. The model believes something false because it appeared in a previous turn.

Context distraction. The context overwhelms training. The model knows the right answer from training but the context says something different.

Context confusion. Superfluous context influences the response. The model pays attention to irrelevant information.

Context clash. Parts of context disagree. The model chooses between contradictory instructions and picks wrong.

Every one of these is a reason to keep the context window small and focused.

What does this look like in practice?

The agent boot sequence: read episodic, semantic, and procedural memory from the database. Load it into the context window as a compact state packet. Not the full history. The current state.

def boot_agent(user_id: str):
    # Read from database, not from context history
    episodic = get_episodic_memory(user_id)
    semantic = get_semantic_memory(user_id)
    procedural = get_procedural_memory(user_id)
    
    return f"""
    Current session: {episodic['recent_summary']}
    User preferences: {semantic['preferences']}
    Standard workflow: {procedural['standard_flow']}
    """

The agent working loop: execute tool calls, write results to the database, keep only the current task in context.

The agent shutdown sequence: write session outcomes to episodic memory. Update semantic memory if new facts emerged. Update procedural memory if a new pattern emerged. Clear the context window.

The context window stays clean. The database stays rich. The agent thinks clearly.

What should you do today?

Context engineering is the #1 job of engineers building AI agents. Most agents fail not because the model is bad but because the context is cluttered with history. Cognition said it directly. The data supports it.

Your 128K context window is not a feature. It’s a workspace with strict limits. Use it for thinking, not for storage. Put your memory in a database. Pull what you need. Compress what you’ve done. Isolate what doesn’t belong together.

The kitchen works because the cutting board is small and the pantry is separate. Your agent works the same way.

Agent mode

Your context window is a workspace, not a filing cabinet. Write plans to storage, select only relevant context, compress older conversation history, and isolate sub-agents with their own windows. Memory lives in Postgres, not the prompt. Run this pattern and your agent will stop drowning in its own history.

FAQ

What is context engineering? Deciding what information enters an AI model’s context window, when it enters, and how it’s formatted. Karpathy calls it “the delicate art and science of filling the context window with the right information.”

What are the four strategies? Write (save outside), Select (pull in), Compress (trim), Isolate (split across agents). LangChain’s repo has runnable notebook code for all four.

Why does context size matter? Mem0’s BEAM benchmark: 64.1 accuracy at 1M tokens drops to 48.6 at 10M tokens. A 25% loss.

Should I put memory in the context window? No. Context is a workspace. Memory belongs in a database. The agent reads on boot and writes on execution.

What memory types does an agent need? Episodic (what happened), semantic (facts), procedural (how things should be done). RAG alone doesn’t handle temporal continuity.

AI agent context window management. How to manage context windows for long-running agent sessions.
Agent memory and privacy research 2026. What the research says about agent memory, privacy, and data retention.
Your agent finished. That doesn’t mean it worked.. Why task completion is the most misleading agent metric.

This article was published on Agentic Up (https://agenticup.dev): practical guides for developers and founders building with AI agents. Reach me at [email protected]