---
title: Why your agent forgets conversations (and how to fix it with a branching tree)
canonical: "https://agenticup.dev/posts/ai-agent-branching-sessions/"
pubDate: "2026-06-06T00:00:00.000Z"
description: "You ask your agent to try something different. It forgets the original conversation. You try to go back and the agent is confused. That's not a memory problem. it's a data structure problem. Here's why sessions are stored wrong and how to fix it."
tags: [sessions, memory, context, state-management, branching, fork-resume]
---

The first time I tried to branch a conversation with my agent, I lost the original thread.

I was working on a code review agent. The user asked: "what if we had used a different framework?" I wanted to explore that branch without losing the main conversation. What I got instead was either: the agent forgot the original context and got confused, or I had to start a new session and manually copy-paste the history.

That's not a memory problem. It's a data structure problem.

Most agents store conversation history as a line. Each turn appends to the end. The line works fine for linear conversations. It breaks the moment you want to explore a branch.

Here's the fix: store sessions as a branching tree.

**TL;DR:** Your agent forgets conversations because sessions are stored as a line, not a tree. A branching tree stores every turn as a node with optional children: so exploring "what if?" doesn't lose the main thread. Fork creates a new branch from the last common node. Resume walks the tree to find where you left off. The model is simple and you can implement it without any framework.

> **Key takeaways:**
> - Linear session storage breaks the moment you want to explore a branch
> - A branching tree stores every turn as a node with parent_id and child_ids
> - Forking creates a new node that points back to the last common ancestor
> - Resume walks the tree to the right node: the full context is there
> - Branching and context window compaction solve different problems and work together

## Why the line breaks

Imagine a 30-turn conversation. The user has been working with the agent on a feature implementation. At turn 20, the user says: ", let's try a different approach: what if we used a cache instead?"

In a linear store, you have two options:

**Option A:** Overwrite the main thread. The new approach becomes the conversation. The old approach, everything from turns 1-19, is gone. You can't go back.

**Option B:** Create a copy. Manually duplicate turns 1-19 into a new session. Now you have two sessions but no connection between them. If the branch works and you want to merge insights back into the main conversation, you have to do it manually.

Both options lose something. The line doesn't support exploration without loss.

## The tree model in plain terms

Think about how a restaurant kitchen's ticket log works when orders change mid-way.

A customer orders the beef. The kitchen starts prepping. The customer changes their mind: ", I want the fish." The kitchen writes a new ticket that branches off the original order. The original order, the beef prep that was already started, is still in the log. The kitchen can go back and look at it.

Now imagine instead that the kitchen had one linear list of tickets. When the customer changes their order, the kitchen either: cancels the beef ticket entirely and starts fresh, or creates a completely separate list with no connection to the first. Either way, the kitchen loses the ability to compare what happened on each path.

A branching tree is the kitchen's ticket log with the branching capability built in.

## The data structure

```python
from dataclasses import dataclass
from typing import Optional
from datetime import datetime

@dataclass
class TurnNode:
 turn_id: str
 parent_id: Optional[str] = None # None for the root turn
 children_ids: list[str] = [] # branches that forked from this node
 user_message: str = ""
 agent_response: str = ""
 tool_calls: list[dict] = None # tool calls made in this turn
 tool_results: list[dict] = None # results returned
 metadata: dict = None # timestamps, cost, etc.

class BranchingSessionStore:
 def __init__(self):
 self.nodes: dict[str, TurnNode] = {} # turn_id -> node
 self.current_tip: str = "" # the active branch tip

 def add_turn(self, user_message: str, agent_response: str,
 tool_calls: list, tool_results: list) -> str:
 """Add a turn to the current branch."""
 node = TurnNode(
 turn_id=self._new_id(),
 parent_id=self.current_tip,
 children_ids=[],
 user_message=user_message,
 agent_response=agent_response,
 tool_calls=tool_calls or [],
 tool_results=tool_results or [],
 metadata={"created_at": datetime.utcnow().isoformat()}
 )
 self.nodes[node.turn_id] = node

 # Update parent's children
 if node.parent_id:
 self.nodes[node.parent_id].children_ids.append(node.turn_id)

 self.current_tip = node.turn_id
 return node.turn_id

 def fork(self, at_turn_id: str) -> str:
 """Create a new branch starting from at_turn_id."""
 # Verify the node exists
 if at_turn_id not in self.nodes:
 raise ValueError(f"Turn {at_turn_id} does not exist")

 # Create a new node that branches from at_turn_id
 # but doesn't carry forward the parent's response history
 fork_node = TurnNode(
 turn_id=self._new_id(),
 parent_id=at_turn_id,
 children_ids=[],
 metadata={
 "created_at": datetime.utcnow().isoformat(),
 "forked_from": at_turn_id
 }
 )
 self.nodes[fork_node.turn_id] = fork_node
 self.nodes[at_turn_id].children_ids.append(fork_node.turn_id)

 self.current_tip = fork_node.turn_id
 return fork_node.turn_id

 def get_context_for_turn(self, turn_id: str, max_turns: int = 50) -> list[dict]:
 """Walk back from turn_id to build context for the LLM."""
 path = []
 current_id = turn_id

 # Walk back to the root, collecting nodes
 while current_id and len(path) < max_turns:
 node = self.nodes.get(current_id)
 if not node:
 break
 path.insert(0, {
 "role": "user",
 "content": node.user_message
 })
 path.insert(1, {
 "role": "assistant",
 "content": node.agent_response
 })
 current_id = node.parent_id

 return path
```

## Forking: what happens

When you fork, you're not copying the entire conversation. You're creating a new node that points back to the last common ancestor.

```python
# Fork at turn 20 of a 30-turn conversation
new_branch_tip = session.fork(at_turn_id="turn_20")

# What just happened:
# - Created 1 new node (the fork point)
# - The new node's parent_id = "turn_20"
# - turn_20's children_ids now includes the new branch
# - All nodes turn_1 through turn_20 are SHARED between branches
# - Storage cost: 1 node, not 20
```

The common ancestor is shared. Only the nodes after the fork point are stored separately. This means forking early is cheap, you store 1 node. Foring late costs more, you store more new nodes. But the cost is proportional to how many branches you explore, not to the total session length.

## Resuming: walking the tree

When you want to resume a conversation, you walk the tree from the root to the target node and reconstruct the context:

```python
def resume(session: BranchingSessionStore, branch_tip: str) -> list[dict]:
 """Resume from a specific branch tip, building full context."""
 # Get the node
 node = session.nodes.get(branch_tip)
 if not node:
 raise ValueError(f"Branch tip {branch_tip} not found")

 # Walk back to root to build the full path
 context = []
 current = node

 while current:
 # Add this turn to the front of the context (we're walking backwards)
 context.insert(0, {
 "role": "user",
 "content": current.user_message
 })
 context.insert(1, {
 "role": "assistant",
 "content": current.agent_response
 })
 # Tool calls and results in this turn
 if current.tool_calls:
 context.append({"role": "system", "content": f"tools_used: {current.tool_calls}"})
 if current.tool_results:
 context.append({"role": "system", "content": f"tool_results: {current.tool_results}"})

 current = session.nodes.get(current.parent_id) if current.parent_id else None

 return context
```

The key property: the context is always complete. When you resume from a branch, you get the full history from the root to the branch tip: no gaps, no confusion about what happened before the fork.

## The branching decision: when to fork

Forking is free in the sense that it doesn't destroy anything. But it does create a new branch that you need to manage. Here are the situations where forking is the right move:

**Exploration**: The user asks "what if we did X instead of Y?" Fork. The original path stays intact. If X works better, you can later merge insights. If it doesn't, you return to the main branch without losing anything.

**Rollback**: The agent took a wrong turn and you want to go back to a known good state. Fork from the last good turn. Work on the new branch. If it works, keep it. If it doesn't, the original branch is still there.

**A/B testing**: You're not sure which approach works better. Fork both and run them in parallel. Compare outcomes. Keep the better one.

**Debugging**: Something went wrong at turn 15. Fork at turn 14 and replay with the same context. The original failure path is preserved so you can compare.

The trigger for forking is usually explicit: the user says "let's try a different approach" or you decide as the developer to explore an alternative. The tree structure supports that decision without making it final.

## Branching vs compaction: different problems

The [post on context window management](/posts/ai-agent-context-window-management/) covers a related problem: what happens when the conversation gets long and the context window fills up. That's compaction: summarization, selective forgetting, sliding windows.

Branching is a different problem. Compaction asks: "how do I keep the agent functional when the context is full?" Branching asks: "how do I preserve the full history so I can explore alternatives and audit what happened?"

They solve different problems and work together:

- **Branching** preserves the tree. Even if you compact a long-running conversation, the tree structure lets you access any branch's full history when you resume it.
- **Compaction** manages the tokens. A compact summary of the original conversation is cheaper to pass to the LLM, but the original history is still in the tree if you need it.

The model you want: branching tree for session persistence + compaction strategy for context management. Not either/or.

## What the tree enables

**Audit trails**: You can reconstruct exactly what happened on any branch. If a decision went wrong, you can see the full path that led there: not just the failed branch, but the alternative paths that were considered and abandoned.

**Multiplayer debugging**: When a user reports a problem, you can fork from their session and replay the exact conversation that led to the bug. The original session is undisturbed.

**Experiment tracking**: Run multiple approaches in parallel. Compare outcomes. The tree keeps every experiment's full history.

**Context preservation on reconnect**: User was talking to the agent, got interrupted, came back 3 hours later. The tree still has the full context. You fork from the last turn and resume without having to re-explain.

## The minimum implementation

If you want to implement this without any framework, here's the core:

```python
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class TurnNode:
 id: str
 parent_id: Optional[str]
 children: list[str] = field(default_factory=list)
 messages: list[dict] = field(default_factory=list) # [{"role": "user", "content": ".."}]
 metadata: dict = field(default_factory=dict)

class BranchingSession:
 def __init__(self):
 self.nodes: dict[str, TurnNode] = {}
 self.current: str = ""

 def add_message(self, role: str, content: str):
 node = TurnNode(
 id=self._new_id(),
 parent_id=self.current,
 messages=[{"role": role, "content": content}]
 )
 self.nodes[node.id] = node
 if self.current:
 self.nodes[self.current].children.append(node.id)
 self.current = node.id

 def fork(self, at_id: str) -> str:
 node = TurnNode(id=self._new_id(), parent_id=at_id)
 self.nodes[node.id] = node
 self.nodes[at_id].children.append(node.id)
 self.current = node.id
 return node.id

 def get_context(self, up_to_id: str, limit: int = 50) -> list[dict]:
 path = []
 current = self.nodes.get(up_to_id)
 while current and len(path) < limit:
 path = current.messages + path
 current = self.nodes.get(current.parent_id) if current.parent_id else None
 return path[-limit:]
```

That's the whole thing. Add messages, fork at any point, resume from any branch. The rest is storage, serialization, and UI.

> **Agent mode:** The branching tree model is the foundation of session persistence that doesn't lose context. Every turn is a node. Forking creates a new branch from any node. Resume walks the tree. The data structure is simple enough to implement in an afternoon and robust enough for production.

## FAQ

> **What's wrong with how most agents store conversation history?**
> Most agents store sessions as a linear list : each turn appends to the end. This works until you want to explore a branch: 'what if I had done X instead of Y?' In a linear store, exploring a branch means losing the main thread or creating a messy copy. A branching tree stores each turn as a node with optional children, so every branch stays intact and accessible.
>
> **How does a branching tree let you fork and resume a conversation?**
> Forking creates a new branch node that points back to the last common node in the tree. The original branch stays unchanged. You can work on the branch, and if it doesn't work out, you return to the main branch and resume from the fork point : the full history is still there. The tree structure means every node knows its parent, so you can traverse back to any point.
>
> **What's the difference between session branching and context window compaction?**
> Context window compaction (covered in my post on AI agent context window management) is about fitting more tokens into a limited context : summarization, sliding windows, selective forgetting. Session branching is about preserving the full history as a navigable tree so you can explore alternatives, resume from any point, and audit what happened. They solve different problems and work together.
>
> **Can I implement branching sessions without a framework?**
> Yes. The core data structure is a tree where each node is a turn (user message, agent response, tool calls). Each node has an optional parent_id and optional child_ids. Forking creates a new node with parent_id pointing to the last node of the main branch. Resume walks the tree to find the right node. That's the whole model.
>

## Related Posts

Read [AI agent context window management](/posts/ai-agent-context-window-management/) for the companion problem: how to keep the agent functional when the context window fills up, with strategies like sliding windows, summarization, and structured memory.

Read [AI agent logging and monitoring](/posts/ai-agent-logging-monitoring/) for how to log session state and replay agent runs: the debugging companion to branching sessions.

Read [AI agent multi-step workflows](/posts/ai-agent-multi-step-workflows/) for how multi-step workflows and branching sessions work together: parallel execution, conditional branching, and human-in-the-loop checkpoints.


[The Pydantic AI Harness discussion on session forking and conversation branching](https://github.com/pydantic/pydantic-ai-harness/issues/85) covers implementation patterns for branching session trees.



[Pydantic AI Harness discussion](https://github.com/pydantic/pydantic-ai-harness/issues/85) on session forking covers implementation patterns for branching conversation trees.


---

This article was published on Agentic Up (https://agenticup.dev): practical guides for developers and founders building with AI agents. Reach me at hello@agenticup.dev.
