Chat History
Feature Definition
Section titled “Feature Definition”Every multi-turn conversation with an LLM accumulates history. Each user message, assistant response, tool call, and tool result adds tokens to the context. After enough turns, the accumulated history exceeds the model’s context window, and the agent must decide what to keep and what to discard.
This is distinct from token budgeting, which covers how the total context window is allocated across system prompts, repo maps, file contents, and history. Chat history management is specifically about how the conversation history portion is stored, retrieved, pruned, and compressed. For persistence layout and resume mechanics, see Session Directory Layout and Session Resumption.
The hard parts are:
- What to cut: Not all history is equally valuable. The most recent exchange is almost always important. Tool results from five turns ago are usually noise. Intermediate reasoning that led to a correct conclusion can be discarded, but reasoning that led to an error the model is still debugging cannot. There’s no static rule — the value of a message depends on the current conversation state.
- Summarization vs truncation: Truncation (dropping old messages) is fast and deterministic but loses information permanently. Summarization (asking an LLM to compress old turns) preserves intent but is slow, costs tokens, and can introduce errors. Getting the balance right is critical.
- Persistence format: The in-memory representation during a session and the on-disk format for persistence serve different needs. In-memory needs fast append and scan. On-disk needs durability, query support, and the ability to resume sessions days later.
- Compaction markers: When history is summarized, the agent needs to know which messages have been compacted and which are still live. Without clear markers, resumed sessions can re-process already-summarized content or lose track of what the summary covers.
- Background processing: Summarization involves an LLM call that can take seconds. If this blocks the main conversation loop, the user waits. If it runs asynchronously, there are race conditions between the summary completing and new messages arriving.
Aider Implementation
Section titled “Aider Implementation”Reference: references/aider/aider/ | Commit: b9050e1d5faf8096eae7a46a9ecc05a86231384b
In-Memory: Two Lists
Section titled “In-Memory: Two Lists”Aider maintains conversation history as two separate Python lists in base_coder.py (lines 395-403):
self.done_messages— Completed turns (accumulated history from prior turns)self.cur_messages— Current turn’s messages (active exchange, not yet committed)
Each message is a simple dict: {"role": "user"|"assistant", "content": "..."}. There’s no structured type, no message ID, no timestamps.
Turn Transition
Section titled “Turn Transition”When a turn completes, move_back_cur_messages() (lines 1036-1046) archives the current exchange:
def move_back_cur_messages(self, message=None): self.done_messages += self.cur_messages self.summarize_start() # check if summarization needed if message: self.done_messages += [ {"role": "user", "content": message}, {"role": "assistant", "content": "Ok."}, ] self.cur_messages = []The optional message parameter allows injecting a synthetic exchange — used when the system needs to add context (like “the user added file X to the chat”) as a user/assistant pair. The "Ok." assistant response is a placeholder to maintain role alternation.
Disk Persistence: Markdown Append Log
Section titled “Disk Persistence: Markdown Append Log”Chat history is persisted to a Markdown file via io.py:1117-1136:
def append_chat_history(self, text, linebreak=False, blockquote=False): # Appends to ~/.aider.chat.history.mdThe format is append-only Markdown with timestamps:
# aider chat started at 2024-03-15 14:32:07
#### Fix the login handler to validate the JWT token before checking permissions.
I'll update the login handler...User messages are prefixed with #### (H4). Assistant responses are raw content. There’s no structured separation — the file is meant to be human-readable, not machine-parseable.
History Restoration
Section titled “History Restoration”On session resume, if restore_chat_history=True (lines 519-523):
history_md = self.io.read_text(self.io.chat_history_file)done_messages = utils.split_chat_history_markdown(history_md)self.done_messages = done_messagesself.summarize_start()The split_chat_history_markdown() function parses the Markdown back into message dicts by splitting on #### boundaries. This is fragile — if an assistant response contains ####, the parser breaks.
Summarization: Recursive LLM Compression
Section titled “Summarization: Recursive LLM Compression”The ChatSummary class in history.py handles context overflow:
class ChatSummary: def __init__(self, models, max_tokens): self.models = models # [weak_model, primary_model] self.max_tokens = max_tokens # from max_chat_history_tokensOverflow detection (too_big(), line 23):
def too_big(self, messages): sized = self.tokenize(messages) total = sum(tokens for tokens, _msg in sized) return total > self.max_tokensToken counting uses the model’s tokenizer (via litellm). The max_tokens budget is typically set to the model’s total context minus reserved space for system prompt, repo map, and file contents.
Summarization algorithm (summarize(), lines 30-113):
- If messages fit within budget, return unchanged
- Find the split point — walk backwards through messages to find the last assistant message boundary in the first half
- Split into
head(older messages, to be summarized) andtail(recent messages, kept verbatim) - Recursively summarize
headif it’s still too large (depth limit = 3) - Send
headto the LLM with a summarization prompt - Replace
headwith a single summary message:{"role": "user", "content": summary_text} - Return
[summary_message] + tail
The summarization prompt asks the LLM to preserve key information: what files were discussed, what changes were made, what errors occurred, and what the user’s intent was.
Model selection: Summarization tries the weak model first (self.models[0]), falling back to the primary model if the weak model fails (lines 114-122). This saves cost — summarization doesn’t need the best model.
Background Thread
Section titled “Background Thread”Summarization runs in a background thread to avoid blocking the user (lines 1002-1034 of base_coder.py):
def summarize_start(self): if self.summarizer.too_big(self.done_messages): self.summarizer_thread = threading.Thread( target=self.summarize_worker ) self.summarizer_thread.start()
def summarize_worker(self): result = self.summarizer.summarize(self.done_messages_snapshot) self.summarized_done_messages = result
def summarize_end(self): if self.summarizer_thread: self.summarizer_thread.join() self.done_messages = self.summarized_done_messagessummarize_start() is called after each turn. summarize_end() is called before the next LLM call, blocking until the summary is ready. This gives the summarizer the time between turns to complete — usually a few seconds while the user types their next message.
Race condition: If the user sends a new message before summarization completes, summarize_end() blocks the conversation until the summary finishes. There’s no cancellation mechanism.
Codex Implementation
Section titled “Codex Implementation”Reference: references/codex/codex-rs/core/src/ | Commit: 4ab44e2c5cc54ed47e47a6729dfd8aa5a3dc2476
In-Memory: Vec
Section titled “In-Memory: Vec”Codex maintains conversation history as items: Vec<ResponseItem> in the ContextManager struct (context_manager/history.rs:25-29). Items are ordered oldest-to-newest. The ResponseItem type comes from the OpenAI Responses API and includes all message types: user messages, assistant responses, function calls, function outputs.
Adding items (record_items(), lines 65-80):
pub fn record_items(&mut self, items: Vec<ResponseItem>) { // Filter and process items // Append to self.items}Prompt construction (for_prompt(), lines 86-91):
pub fn for_prompt(&self) -> Vec<ResponseItem> { // Return normalized history for LLM self.items.clone()}Disk Persistence: JSONL Append Log
Section titled “Disk Persistence: JSONL Append Log”Codex persists history to ~/.codex/history.jsonl via message_history.rs (lines 48-140). Each entry is a single JSON line:
{"session_id":"550e8400-e29b-41d4-a716-446655440000","ts":1710523927,"text":"Fix the login handler"}Atomic writes: All writes use O_APPEND and stay within PIPE_BUF size to guarantee atomicity on POSIX systems (lines 112-140). This means multiple concurrent Codex sessions can safely append to the same history file.
Advisory locking: Read operations acquire a shared advisory lock, write operations acquire an exclusive lock (line 346). This prevents corruption from concurrent trimming and appending.
History Trimming (Not Summarization)
Section titled “History Trimming (Not Summarization)”Codex does not use LLM-based summarization. Instead, it trims the history file by byte size (message_history.rs:159-244):
const HISTORY_SOFT_CAP_RATIO: f64 = 0.8;
fn enforce_history_limit(&self, config: &Config) -> Result<()> { // If file exceeds config.history.max_bytes: // 1. Read entire file // 2. Drop oldest lines until file_size < max_bytes * 0.8 // 3. Rewrite file with remaining lines // 4. Preserve the newest entry unconditionally}The soft cap at 80% of the hard limit prevents repeated trimming — once the file is trimmed, it has 20% headroom before the next trim is triggered. This is crude compared to Aider’s summarization (information is permanently lost, not compressed), but it’s fast and deterministic with no LLM dependency.
Context Window Overflow
Section titled “Context Window Overflow”For within-session context management, Codex uses truncation, not summarization (context_manager/history.rs:125-144):
pub fn remove_first_item(&mut self) -> Option<ResponseItem> { // Remove oldest item from history}
pub fn drop_last_n_user_turns(&mut self, n: usize) { // Remove the last N user turns (for rollback)}Token tracking uses a byte-based heuristic (4 bytes ≈ 1 token) for items added after the last API response, combined with exact token counts from the API’s usage field for prior items (lines 251-282):
pub fn estimate_token_count_with_base_instructions(&self) -> usize { // API-reported tokens for items before last response // + byte-estimate for items after last response // + base instruction tokens}When the context overflows, Codex’s auto-compact feature iteratively removes the oldest items until the estimated token count fits within 95% of the model’s context window. This uses the same model at lower reasoning effort — it’s truncation with re-prompting, not summarization.
History Lookup
Section titled “History Lookup”The history file supports random access by entry offset (message_history.rs:247-384):
pub fn history_metadata(config: &Config) -> (LogId, usize) { // Returns (log_id, entry_count)}
pub fn lookup(log_id: LogId, offset: usize, config: &Config) -> Option<HistoryEntry> { // Retrieve specific entry by position}LogId is the file’s inode (Unix) or creation time (Windows), used to detect file rotation. If the log_id changes between metadata and lookup, the offset is invalid.
OpenCode Implementation
Section titled “OpenCode Implementation”Reference: references/opencode/packages/opencode/src/session/ | Commit: 7ed449974864361bad2c1f1405769fd2c2fcdf42
Persistence: SQLite
Section titled “Persistence: SQLite”OpenCode stores all conversation data in SQLite with three tables (session.sql.ts:11-88):
SessionTable:
id, project_id, parent_id, slug, directory, title, version,share_url, summary_additions, summary_deletions, summary_files, summary_diffs,revert, permission,time_created, time_updated, time_compacting, time_archivedMessageTable:
id, session_id, time_created, time_updated, data (JSON)PartTable:
id, message_id, session_id, time_created, time_updated, data (JSON)The data column in both MessageTable and PartTable stores JSON blobs — the structured message/part content serialized via Zod schemas. Cascade deletes ensure session deletion cleans up all associated messages and parts.
Message Format
Section titled “Message Format”Messages use a rich type system (message-v2.ts), with message info and message parts stored separately:
interface UserMessageInfo { id: string sessionID: string role: "user" time: { created: number } agent: string model: { providerID: string; modelID: string } system?: string tools?: Record<string, boolean> summary?: { title?: string; body?: string; diffs: FileDiff[] }}
interface AssistantMessageInfo { id: string sessionID: string role: "assistant" time: { created: number; completed?: number } parentID: string modelID: string providerID: string agent: string path: { cwd: string; root: string } cost: number tokens: { input: number output: number reasoning: number total?: number cache: { read: number; write: number } } summary?: boolean error?: unknown}MessagePart[] entries (TextPart, ToolPart, CompactionPart, PatchPart, etc.) are stored in PartTable and joined back to each message when streamed.
MessagePart is a discriminated union with 10+ variants:
TextPart— user/assistant textReasoningPart— model chain-of-thought (Claude thinking blocks)ToolPart— tool call with state (pending|running|completed|error)FilePart— attached filesCompactionPart— summary marker (critical for compaction system)SubtaskPart— delegated task referenceSnapshotPart— filesystem state at that pointPatchPart— diff hashes for undo
This is far more structured than Aider’s {"role", "content"} dicts or Codex’s ResponseItem enum. The trade-off is complexity — serialization, migration, and querying all become harder.
Message Loading: Streamed and Filtered
Section titled “Message Loading: Streamed and Filtered”Messages are loaded via an async generator that streams from SQLite in batches (message-v2.ts:716-809):
async function* stream(sessionID: string) { // Query MessageTable descending by time_created // Batch size: 50 messages per query // For each batch, join PartTable // Yield {info, parts} tuples}The generator yields messages newest-first. The consumer — filterCompacted() — scans backwards from the newest message and stops when it hits a completed compaction boundary (lines 794-809):
function filterCompacted(messages: Message[]) { // Walk newest → oldest // Track: has an assistant message with summary=true been "finished"? // When a user message matches the completed summary's ID → stop // Reverse the collected messages → return oldest-first}This is how OpenCode avoids loading the entire history on session resume. The compaction marker tells the loader “everything before this point has been summarized — stop here.” Only messages after the last compaction are loaded into the LLM’s context.
Compaction: Two-Phase System
Section titled “Compaction: Two-Phase System”OpenCode’s compaction system (compaction.ts, 262 lines) has two phases:
Phase 1: Pruning (prune(), lines 58-99):
Before summarizing, prune large tool outputs that are unlikely to be relevant:
function prune(messages: Message[]) { // Walk backwards through messages // Keep minimum 2 user turns untouched // For each tool output: // If accumulated tokens > PRUNE_PROTECT (40,000): mark as compacted // Skip protected tools (e.g., "skill") // Only prune if total pruned > PRUNE_MINIMUM (20,000) // Mark pruned parts with compacted: Date.now()}Pruned tool outputs aren’t deleted — they’re marked with compacted: Date.now(). When the prompt builder encounters a compacted tool result, it replaces the content with "[Old tool result content cleared]" (line 620 of prompt.ts). The original data stays in SQLite for forensic purposes.
Phase 2: Summarization (process(), lines 101-229):
After pruning, if the context is still too large, a full LLM-based summary is generated:
- Create an assistant message with
summary: trueflag - Build a prompt from the conversation above the overflow point
- Call the LLM to generate a summary via
SessionProcessor - Store the summary as a
CompactionPartin the assistant message - Optionally auto-continue with a synthetic user message (lines 202-225)
- Publish
SessionCompaction.Event.Compactedevent
Overflow Detection
Section titled “Overflow Detection”function isOverflow(messages: Message[], model: Model) { const usable = model.limit.input - COMPACTION_BUFFER // 20,000 reserved const used = totalTokens(messages) // input + output return used > usable}The COMPACTION_BUFFER of 20,000 tokens ensures compaction triggers before the context window is completely full, leaving room for the next turn’s system prompt and user message.
Auto-Compaction Trigger
Section titled “Auto-Compaction Trigger”Compaction is triggered automatically in the main prompt loop (prompt.ts:542-554):
// After each LLM response completes:if (lastFinished && isOverflow(messages, model)) { await compact(sessionID) // Re-load messages (now with compaction marker) // Continue prompt loop}This runs synchronously between turns — unlike Aider’s background thread approach. The user sees a “Compacting…” indicator in the TUI while the summary is generated.
Config
Section titled “Config”// Disable auto-compaction:config.compaction?.auto === false
// Constants:COMPACTION_BUFFER = 20_000 // tokens reserved before overflowPRUNE_PROTECT = 40_000 // token threshold before pruning startsPRUNE_MINIMUM = 20_000 // minimum tokens to justify pruningClaude Code Implementation
Section titled “Claude Code Implementation”Claude Code takes a fundamentally different approach to chat history management by leveraging server-side API features for compaction and context editing, a hierarchical memory system for persistent knowledge, and a checkpoint system for session-level undo.
Memory System: Persistent Context Beyond History
Section titled “Memory System: Persistent Context Beyond History”Unlike Aider/Codex/OpenCode which treat chat history as the primary persistence layer, Claude Code separates instructions (CLAUDE.md files) from session history (conversation transcript) from learned knowledge (auto memory).
CLAUDE.md hierarchy (six levels, loaded at session start):
| Level | Location | Scope |
|---|---|---|
| Managed policy | /etc/claude-code/CLAUDE.md (Linux) | Organization-wide |
| Project memory | ./CLAUDE.md or ./.claude/CLAUDE.md | Team via VCS |
| Project rules | ./.claude/rules/*.md | Team, path-scoped via frontmatter |
| User memory | ~/.claude/CLAUDE.md | Personal, all projects |
| Project local | ./CLAUDE.local.md | Personal, current project |
| Auto memory | ~/.claude/projects/<project>/memory/ | Per-project, auto-generated |
CLAUDE.md files above CWD are loaded in full at launch. Files in child directories load on-demand when Claude reads files in those subtrees. More specific instructions take precedence over broader ones.
Auto memory is a separate subsystem where Claude records its own learnings:
- Stored at
~/.claude/projects/<project>/memory/withMEMORY.mdas an index file and optional topic files (e.g.,debugging.md,api-conventions.md) - Only the first 200 lines of
MEMORY.mdare loaded into the system prompt at session start - Topic files are loaded on demand via standard file tools
- Each git repo gets one memory directory; git worktrees get separate directories
- Control via
CLAUDE_CODE_DISABLE_AUTO_MEMORY=0(force on) or=1(force off)
CLAUDE.md imports: Files can reference other files with @path/to/import syntax, resolved relative to the containing file. Recursive imports up to depth 5. First-encounter approval dialog per project.
Project rules (.claude/rules/*.md): Modular, topic-specific instructions. Optional YAML frontmatter with paths: field for glob-based conditional activation. Subdirectories supported, recursively discovered. Symlinks allowed.
Checkpointing: Session-Level Undo
Section titled “Checkpointing: Session-Level Undo”Claude Code tracks file edits as checkpoints — one per user prompt. Checkpoints are not conversation history management per se, but they interact with it via the rewind menu.
- Every user prompt creates a checkpoint
- Persists across sessions, auto-cleaned after 30 days (configurable)
- Only tracks changes made by Claude’s file editing tools — NOT bash commands, NOT external changes
- Complements (does not replace) version control
Rewind menu (via Esc + Esc or /rewind):
| Action | Code | Conversation |
|---|---|---|
| Restore code and conversation | Revert to checkpoint | Rewind to that message |
| Restore conversation | Keep current | Rewind to that message |
| Restore code | Revert to checkpoint | Keep current |
| Summarize from here | No change | Compress from selected point forward |
“Summarize from here” is a targeted compaction — it keeps early context in full detail and only compresses from the selected point forward. The original messages are preserved in the session transcript for reference. Accepts optional instructions to guide the summary focus.
Server-Side Compaction
Section titled “Server-Side Compaction”Claude Code uses the Anthropic API’s server-side compaction (beta compact-2026-01-12) rather than implementing client-side summarization like Aider or OpenCode:
- API detects input tokens exceed trigger threshold (default 150K, minimum 50K)
- Generates summary of conversation
- Creates a
compactioncontent block containing the summary - Continues the response with compacted context
- On subsequent requests, API auto-drops all message blocks prior to the last
compactionblock
Configuration:
trigger: when to compact (input token threshold)pause_after_compaction: return immediately after summary so client can preserve recent messages before continuinginstructions: custom summarization prompt (completely replaces default)
The pause_after_compaction option enables a pattern where the client preserves the last N messages verbatim alongside the compaction summary, avoiding information loss for recent context.
Context Editing: Server-Side Pruning
Section titled “Context Editing: Server-Side Pruning”In addition to compaction, Claude Code uses the API’s context editing features (beta context-management-2025-06-27) for lighter-weight context management:
Tool result clearing (clear_tool_uses_20250919):
- Trigger: configurable input token threshold (default 100K)
- Keeps: configurable number of recent tool use/result pairs (default 3)
- Cleared results replaced with placeholder text
exclude_tools: never-clear list for important tool typesclear_at_least: minimum tokens to clear (amortizes cache invalidation cost)
Thinking block clearing (clear_thinking_20251015):
- Manages extended thinking blocks to save context space
- Default: keep only last assistant turn’s thinking
- Can keep N turns or
"all"(maximizes prompt cache hits)
Both strategies are server-side — the client maintains full unmodified history while the API applies edits before the prompt reaches Claude.
Context Awareness
Section titled “Context Awareness”Claude models (Sonnet 4.6, Sonnet 4.5, Haiku 4.5) receive their token budget at session start:
<budget:token_budget>200000</budget:token_budget>After each tool call, remaining capacity is updated:
<system_warning>Token usage: 35000/200000; 165000 remaining</system_warning>This is a model-training feature — the model is trained to use this information for planning tasks within its available context.
Status Line: Real-Time Context Monitoring
Section titled “Status Line: Real-Time Context Monitoring”Claude Code exposes context state to users via a customizable status line — a shell script that receives JSON session data on stdin after each assistant message (debounced at 300ms). Key context-related fields:
context_window.used_percentage/remaining_percentage— calculated from input tokenscontext_window.context_window_size— 200K default, 1M for extended contextcontext_window.current_usage— token counts from last API call (input, output, cache creation, cache read)cost.total_cost_usd— accumulated session costexceeds_200k_tokens— boolean threshold indicator
Comparison Table
Section titled “Comparison Table”| Aspect | Aider | Codex | OpenCode | Claude Code |
|---|---|---|---|---|
| Compaction location | Client (background thread) | Client (post-response) | Client (synchronous) | Server-side API |
| Compaction trigger | History token budget exceeded | auto_compact_token_limit | isOverflow() post-response | API input token threshold (default 150K) |
| Pruning | None (summarize everything) | Truncate oldest items | Two-phase (prune tools, then summarize) | Server-side tool result clearing (configurable) |
| Token counting | Exact (litellm) | Byte heuristic (4 bytes/token) | Char heuristic (4 chars/token) + API-reported | Free API endpoint pre-send |
| Persistent memory | None | None | None | 6-level CLAUDE.md hierarchy + auto memory |
| Session undo | None | None | None | Checkpoint per prompt with rewind menu |
| Context awareness | None | None | None | Model-level budget injection |
| Thinking management | N/A | Encrypted reasoning | Reasoning parts | Server-side thinking block clearing |
Pitfalls & Hard Lessons
Section titled “Pitfalls & Hard Lessons”Summary Drift
Section titled “Summary Drift”When an LLM summarizes conversation history, it can subtly distort the intent. “The user asked to fix the login handler” might become “The user wants to update the authentication system” — close but not identical. Over multiple rounds of summarization (Aider’s recursive approach), these distortions compound. The model ends up working from a summary-of-a-summary that no longer accurately reflects what the user originally asked for.
Pruning vs Summarization
Section titled “Pruning vs Summarization”OpenCode’s two-phase approach (prune tool outputs first, then summarize if needed) is empirically better than Aider’s summarize-everything approach. Most context bloat comes from large tool outputs (file reads, search results, command output) that are only relevant for the turn they were generated. Pruning these before summarization reduces the summary’s job to compressing actual conversation, not reams of file contents.
Markdown Persistence Fragility
Section titled “Markdown Persistence Fragility”Aider’s Markdown history format is human-readable but machine-fragile. The #### separator breaks when assistant responses contain Markdown headers. The timestamp-based session boundaries don’t support querying by session ID. There’s no transaction safety — a crash mid-write can corrupt the file. SQLite (OpenCode) and JSONL with atomic writes (Codex) are more robust, at the cost of human readability.
Background vs Synchronous Summarization
Section titled “Background vs Synchronous Summarization”Aider runs summarization in a background thread, which avoids blocking the user but introduces a race condition: if the user sends a message before summarization completes, the main thread blocks on thread.join(). OpenCode runs compaction synchronously between turns, which is simpler and avoids races but makes the user wait. Codex avoids the problem entirely by not summarizing.
Token Counting Accuracy
Section titled “Token Counting Accuracy”Aider uses exact token counts via the model’s tokenizer (litellm). Codex uses a 4-byte heuristic for new items and exact counts from the API for prior items. OpenCode uses exact API-reported tokens for completed turns and heuristics for in-progress content. The heuristic approaches are faster but can be off by 10-20%, which means compaction triggers earlier or later than optimal. For models with large context windows (200k+), this margin is acceptable. For smaller windows (8k-32k), exact counting matters more.
Session Resume Semantics
Section titled “Session Resume Semantics”Aider’s “restore chat history” replays the entire Markdown file into done_messages, then summarizes if too large. This means resuming a long session triggers an immediate summarization that can take 10+ seconds. Codex’s JSONL lookup supports random access by offset but doesn’t reconstruct the conversation context — it’s a history browser, not a session resume mechanism. OpenCode’s compaction markers allow efficient resume: load messages after the last compaction boundary, and the summary message provides the compressed context for everything before.
Cascade Deletes and Data Loss
Section titled “Cascade Deletes and Data Loss”OpenCode’s SQLite schema uses cascade deletes — deleting a session deletes all its messages and parts. This is clean but irreversible. There’s no soft-delete, no trash, no recovery mechanism. Codex’s JSONL file can be manually edited to recover deleted entries. Aider’s Markdown file is append-only and never deletes anything.
Server-Side Compaction Is Not Free
Section titled “Server-Side Compaction Is Not Free”Claude Code’s server-side compaction requires an additional sampling step — the summary generation is billed as a separate API call within the same request. The usage.iterations array separates compaction costs from message costs, but developers must sum across all iterations for accurate billing. Top-level input_tokens/output_tokens exclude compaction iterations, which can be misleading.
Checkpoint Blind Spots
Section titled “Checkpoint Blind Spots”Claude Code’s checkpointing only tracks file editing tools. If Claude runs rm file.txt or mv old.txt new.txt via bash, those changes are invisible to the checkpoint system and cannot be undone via rewind. This creates a false sense of safety when Claude uses shell commands for file operations.
Memory Loading Budget
Section titled “Memory Loading Budget”Auto memory’s 200-line limit on MEMORY.md is a hard cutoff. If the index file grows beyond that, content is silently dropped from the system prompt. Claude is instructed to keep it concise by moving details to topic files, but there’s no enforcement mechanism — a verbose auto-save could push important content past the limit.
Compaction Pause Complexity
Section titled “Compaction Pause Complexity”The pause_after_compaction pattern (return after summary, client preserves recent messages, re-send) adds a second API round-trip. If the client doesn’t handle the stop_reason: "compaction" case, the conversation breaks. This is an easy integration bug in headless/CI deployments.
OpenOxide Blueprint
Section titled “OpenOxide Blueprint”Architecture
Section titled “Architecture”OpenOxide should use a three-layer system:
- In-memory: A
Vec<Message>ordered oldest-to-newest, with efficient append and backward scan - On-disk: SQLite for structured persistence with message/part separation
- Compaction: Two-phase (prune tool outputs, then LLM summarization) with marker-based filtering
Storage Schema
Section titled “Storage Schema”// SQLite tablesstruct Session { id: Ulid, project_id: String, parent_id: Option<Ulid>, // for forked sessions title: String, directory: PathBuf, created_at: i64, updated_at: i64, compacted_at: Option<i64>, // last compaction timestamp}
struct Message { id: Ulid, session_id: Ulid, role: Role, // User | Assistant created_at: i64, completed_at: Option<i64>,}
struct Part { id: Ulid, message_id: Ulid, session_id: Ulid, // denormalized for efficient session queries kind: PartKind, // Text | Reasoning | ToolCall | ToolResult | Summary | Snapshot data: serde_json::Value, // JSON blob compacted_at: Option<i64>, // null = live, Some = pruned created_at: i64,}Crates
Section titled “Crates”| Crate | Purpose |
|---|---|
rusqlite | SQLite bindings (WAL mode for concurrent reads) |
ulid | Time-ordered unique IDs |
serde_json | Part data serialization |
tiktoken-rs | Exact token counting for OpenAI models |
Compaction Algorithm
Section titled “Compaction Algorithm”Follow OpenCode’s two-phase approach:
pub async fn compact(session_id: Ulid, model: &Model) -> Result<()> { let messages = load_after_last_compaction(session_id).await?;
// Phase 1: Prune old tool outputs let pruned_tokens = prune_tool_outputs(&messages, PruneConfig { protect_threshold: 40_000, // start pruning after this many tokens minimum_savings: 20_000, // don't bother if savings < this keep_recent_turns: 2, // always keep last 2 user turns }).await?;
// Phase 2: Summarize if still over budget let budget = model.context_window - COMPACTION_BUFFER; if estimate_tokens(&messages) > budget { let summary = generate_summary(&messages, model).await?; insert_compaction_marker(session_id, summary).await?; }}Message Loading
Section titled “Message Loading”Use a streaming loader that stops at compaction boundaries:
pub async fn load_context(session_id: Ulid) -> Vec<Message> { // Query messages descending by created_at // Stop when a completed compaction marker is found // Reverse to oldest-first order // Replace compacted tool outputs with placeholder text}Token Counting Strategy
Section titled “Token Counting Strategy”Use exact counting for the primary model’s tokenizer, with a 4-byte fallback for unknown models:
pub fn count_tokens(text: &str, model: &str) -> usize { match tiktoken_rs::get_bpe_from_model(model) { Ok(bpe) => bpe.encode_with_special_tokens(text).len(), Err(_) => text.len() / 4, // fallback heuristic }}Background Compaction
Section titled “Background Compaction”Run compaction in a background tokio::task with a channel-based result delivery, similar to Aider’s threading but with proper cancellation:
let (tx, rx) = oneshot::channel();tokio::spawn(async move { let result = compact(session_id, &model).await; let _ = tx.send(result);});
// Before next LLM call:if let Ok(result) = rx.try_recv() { // Apply compaction result} else { // Compaction still running — block or proceed with current context}This gives the user the time between turns for compaction to complete (like Aider) while supporting cancellation via tokio’s cooperative cancellation (unlike Aider’s uninterruptible thread).
Hierarchical Memory System (from Claude Code)
Section titled “Hierarchical Memory System (from Claude Code)”Adopt Claude Code’s multi-level memory hierarchy:
pub struct MemoryConfig { /// System-wide managed policy (e.g., /etc/openoxide/MEMORY.md) pub managed_policy: Option<PathBuf>, /// Project memory (./OPENOXIDE.md or ./.openoxide/MEMORY.md) pub project_memory: Option<PathBuf>, /// Project rules (./.openoxide/rules/*.md, with optional path globs) pub project_rules: Vec<RuleFile>, /// User memory (~/.openoxide/MEMORY.md) pub user_memory: Option<PathBuf>, /// Project local (./OPENOXIDE.local.md, gitignored) pub project_local: Option<PathBuf>, /// Auto memory (~/.openoxide/projects/<project>/memory/) pub auto_memory: AutoMemoryConfig,}
pub struct RuleFile { pub path: PathBuf, pub path_globs: Option<Vec<String>>, // YAML frontmatter paths pub content: String,}
pub struct AutoMemoryConfig { pub dir: PathBuf, pub index_file: PathBuf, // MEMORY.md pub index_line_limit: usize, // 200 pub topic_files: Vec<PathBuf>, // loaded on demand}Loading strategy:
- At session start: load managed policy + project + user + local + auto memory index (first N lines)
- On file access in child directory: load any MEMORY.md found there
- On path match: activate path-scoped rules
Checkpoint System (from Claude Code)
Section titled “Checkpoint System (from Claude Code)”pub struct Checkpoint { pub id: Ulid, pub session_id: Ulid, pub message_id: Ulid, // the user prompt that triggered this checkpoint pub file_states: Vec<FileState>, // snapshot of files before Claude's edits pub created_at: i64, pub ttl_days: u32, // default 30}
pub struct FileState { pub path: PathBuf, pub content_hash: [u8; 32], // SHA-256 of file content pub content: Option<Vec<u8>>, // stored if file was modified}
pub enum RewindAction { RestoreCodeAndConversation, RestoreConversation, RestoreCode, SummarizeFromHere { instructions: Option<String> },}Only track files modified by the agent’s file editing tools, not shell commands. Store file content before each edit for restoration.
Server-Side Compaction Support (from Claude Code)
Section titled “Server-Side Compaction Support (from Claude Code)”When the provider API supports server-side compaction (like Anthropic’s compact_20260112), prefer it over client-side compaction:
pub enum CompactionStrategy { /// Use provider's server-side compaction API ServerSide { trigger_tokens: usize, // default 150_000 pause_after: bool, // pause to preserve recent messages instructions: Option<String>, // custom summarization prompt }, /// Client-side two-phase (prune + summarize) ClientSide { prune_config: PruneConfig, summary_model: Option<String>, }, /// No compaction Disabled,}Server-side is preferred because:
- No extra client logic for summarization
- Summary quality benefits from provider-side optimization
- Integrated with prompt caching (compaction blocks can be cache-controlled)
- Usage tracking via
iterationsarray in response