Skip to content

Agent Loop

The agent loop is the central execution primitive in every AI coding tool. It is the cycle that turns a user message into a sequence of actions: send a prompt, observe the model’s response, execute any requested tools, feed results back, and repeat until the model signals completion or a limit is reached. Getting this loop right determines whether the tool feels responsive, handles errors gracefully, and avoids runaway behavior.

This page traces the full loop implementation in Aider, Codex, OpenCode, and Claude Code, then proposes the OpenOxide design.


Pinned commit: b9050e1d

Aider’s loop lives entirely in aider/coders/base_coder.py (2485 lines). The architecture is a three-tier nesting: an outer input loop, a middle reflection loop, and an inner LLM call.

File: aider/coders/base_coder.py:876

The outermost loop is a simple REPL. It calls get_input() to receive a user message from the prompt_toolkit input, then delegates to run_one(). This loops indefinitely until EOFError (Ctrl+D) or an explicit /exit command.

File: aider/coders/base_coder.py:924

run_one() handles one user message and its reflection cycles. Structure:

def run_one(self, user_message, preproc):
self.init_before_message()
message = self.preproc_user_input(user_message) if preproc else user_message
while message:
self.reflected_message = None
list(self.send_message(message)) # consume the generator
if not self.reflected_message:
break
if self.num_reflections >= self.max_reflections: # max_reflections = 3
break
self.num_reflections += 1
message = self.reflected_message # loop with reflection feedback

The preproc_user_input() step at line 912 handles slash commands (/add, /drop, /model, etc.) and URL extraction before the message enters the LLM path. If the input is a command, it is dispatched via commands.run() and the result (if any) becomes the message.

File: aider/coders/base_coder.py:1419

This is the core agent cycle. Each call to send_message() performs one full prompt-act-observe iteration:

Step 1 — Add user message to context (line 1425): The user message is appended to self.cur_messages as a {"role": "user", "content": inp} dict.

Step 2 — Format and assemble messages (line 1429): format_chat_chunks() (line 1226) builds a ChatChunks dataclass (aider/coders/chat_chunks.py) with eight segments assembled in order:

  1. system — system prompt with edit format instructions
  2. examples — few-shot examples for the edit format
  3. readonly_files — content of read-only files
  4. repo — repo map output (tree-sitter tags + PageRank)
  5. done — previous turns (summarized if over token budget)
  6. chat_files — current files in the editing context
  7. cur — the current turn’s messages
  8. reminder — optional system reminder

chunks.all_messages() concatenates all segments into a flat message list.

Step 3 — Token validation (line 1431): check_tokens(messages) verifies the assembled prompt fits within the model’s context window.

Step 4 — LLM call with retry (lines 1449-1487): A while True loop calls self.send(messages, functions=self.functions) with exponential backoff on transient errors:

retry_delay = 0.125
while True:
try:
yield from self.send(messages, functions=self.functions)
break
except ContextWindowExceededError:
exhausted = True; break
except retryable_error:
retry_delay *= 2
if retry_delay > RETRY_TIMEOUT: break
time.sleep(retry_delay); continue

The send() method (line 1783) calls model.send_completion() which delegates to litellm.completion(). If self.stream is True, it yields from show_send_output_stream() (line 1900), which iterates over the streaming completion and accumulates self.partial_response_content token by token, calling self.live_incremental_response() for real-time markdown rendering.

Step 5 — Extract and apply edits (line 1585): apply_updates() (line 2296) calls the subclass-specific get_edits() to parse the response format, then apply_edits() to write changes to disk. If parsing fails with a ValueError, the error message is assigned to self.reflected_message, triggering a reflection cycle.

Step 6 — Auto-commit (line 1589): If git integration is enabled, auto_commit(edited) creates a commit with a model-generated message.

Step 7 — Lint feedback (lines 1599-1607): If self.auto_lint is enabled and files were edited, lint_edited() runs the linter. If errors are found, they are assigned to self.reflected_message and the method returns — triggering another iteration of the run_one() while loop.

Step 8 — Test feedback (lines 1616-1618): If self.auto_test is enabled, test results are captured and can trigger reflection.

The Coder base class defines the loop; subclasses override get_edits() and apply_edits() to handle different edit formats:

Subclassedit_formatStrategy
EditBlockCoderdiffSEARCH/REPLACE blocks with exact text matching
UnifiedDiffCoderudiffUnified diff hunks with flexible search-and-replace
WholeFileCoderwholeComplete file content between fence markers
AskCoderaskNo edits, question-only mode
ArchitectCoderarchitectTwo-model flow (see below)

Subclass selection happens in Coder.create() (line 125), which takes edit_format as a parameter and returns the appropriate subclass instance. Mid-session format switching is supported via the from_coder parameter, which transfers chat history.

File: aider/coders/architect_coder.py

ArchitectCoder extends AskCoder (no edits). When its reply_completed() hook fires:

  1. The architect model’s plan text is captured
  2. User confirms with “Edit the files?”
  3. An EditorCoder is spawned with the editor_model and editor_edit_format
  4. The editor runs in a fresh conversation: editor_coder.run(with_message=content, preproc=False)
  5. Results are merged back: self.move_back_cur_messages("I made those changes to the files.")

This is Aider’s only multi-model agent pattern. The editor gets a clean context with just the plan text — no accumulated chat history.

self.cur_messages = [] # current turn
self.done_messages = [] # previous turns (may be summarized)
self.reflected_message = None # set to trigger reflection
self.num_reflections = 0 # counter, max 3
self.max_reflections = 3 # hard limit
self.partial_response_content = "" # accumulated LLM output

Pinned commit: 4ab44e2c5

Codex’s loop is fully async, built on tokio, and uses channels for all communication between the TUI and the core engine. The architecture separates submission handling from turn execution.

File: codex-rs/core/src/codex.rs:287

spawn() creates the session infrastructure:

  1. Creates a bounded channel (capacity = 64) for Submission inputs
  2. Creates an unbounded channel for Event outputs
  3. Initializes a watch::channel(AgentStatus::PendingInit) for status tracking
  4. Builds a Session::new() with configuration
  5. Spawns the background submission_loop() task — this is the core event dispatcher

File: codex-rs/core/src/codex.rs:3196

The submission loop runs indefinitely, receiving Op submissions and dispatching to handlers:

Op TypeHandlerPurpose
UserInput / UserTurnuser_input_or_turn()New user message, spawns turn task
ExecApprovalexec_approval()Approval decision for command execution
PatchApprovalpatch_approval()Approval decision for file patches
Interruptinterrupt()Abort current task via CancellationToken
Shutdownshutdown()Graceful session termination
Compactcompact()Manual context compaction

Handler: handlers::user_input_or_turn() (line 3433)

  1. Parse Op::UserInput or Op::UserTurn to extract items and settings updates
  2. Create TurnContext with per-turn config (model, approval policy, sandbox policy, collaboration mode)
  3. Attempt sess.steer_input(items, None) to inject into an active task (if one exists)
  4. If no active task: call sess.spawn_task(current_context, items, RegularTask) to start a new turn

Task Spawning: Session::spawn_task() (line 116 in codex-rs/core/src/tasks/mod.rs)

  • Aborts all previous tasks
  • Creates a CancellationToken for the new task
  • Spawns a background tokio task that calls task.run()
  • On completion, emits TurnComplete event and flushes the rollout

File: codex-rs/core/src/codex.rs:4318

This is where the prompt-act-observe cycle lives. Structure:

Phase 1 — Setup (lines 4325-4462):

  1. Emit TurnStarted event
  2. Run pre-sampling compaction if token budget is tight
  3. Load skills for the current working directory
  4. Collect available tools via ToolsConfig
  5. Record user prompt to history
  6. Start ghost snapshot task for undo support

Phase 2 — Main Loop (lines 4476-4657):

loop {
// Check for pending user input (user typed while model was running)
let pending = sess.get_pending_input().await;
// Build sampling request from conversation history
let input: Vec<ResponseItem> = sess.clone_history().await.for_prompt(...);
// Call API with retry logic
match run_sampling_request(...).await {
Ok(result) => {
let SamplingRequestResult { needs_follow_up, last_agent_message } = result;
// Check token limits
let total_tokens = sess.get_total_token_usage().await;
let limit_reached = total_tokens >= auto_compact_limit;
if limit_reached && needs_follow_up {
run_auto_compact(&sess, &turn_context).await?;
continue; // compact and retry
}
if !needs_follow_up {
break; // model is done
}
// else: model wants to continue, loop again
}
Err(CodexErr::TurnAborted) => break,
Err(e) => { send_error_event; break; }
}
}

The loop continues as long as needs_follow_up is true — meaning the model emitted tool calls that need results fed back. If the token limit is approached mid-turn, auto-compaction runs and the loop continues, enabling arbitrarily long multi-step turns.

Function: run_sampling_request() (line 4897)

Wraps try_run_sampling_request() in a retry loop:

  • Retryable errors (stream failures, timeouts, connection errors): exponential backoff with transport fallback (WebSocket to HTTPS)
  • Non-retryable errors (context window exceeded, quota, invalid request): fail immediately
  • Max retries per provider via provider.stream_max_retries()

Function: try_run_sampling_request() (line 5505)

Streams the API response and processes events in real-time:

SSE EventAction
ResponseEvent::CreatedNo-op
ResponseEvent::OutputItemAdded(item)Emit TurnItemStarted
ResponseEvent::OutputTextDelta(delta)Emit AgentMessageContentDelta
ResponseEvent::OutputItemDone(item)Extract tool call, dispatch via ToolCallRuntime
ResponseEvent::CompletedUpdate token usage, break stream loop
ResponseEvent::ReasoningSummaryDeltaEmit reasoning delta event

File: codex-rs/core/src/tools/parallel.rs:49

ToolCallRuntime::handle_tool_call() spawns a tokio task per tool call:

tokio::select! {
_ = cancellation_token.cancelled() => {
aborted_response(elapsed)
}
res = router.dispatch_tool_call(call, source) => {
// read-lock if parallel, write-lock if serial
res
}
}

Tool calls are collected in a FuturesOrdered for parallel execution. After the stream completes, drain_in_flight() waits for all pending tool results and collects them as ResponseInputItem::FunctionCallOutput items, which feed into the next API request.

When a tool requires approval (determined by ExecPolicyManager):

  1. Tool handler emits EventMsg::ExecApprovalRequest with an approval_id
  2. TUI renders the approval overlay
  3. User responds with Op::ExecApproval { approval_id, decision }
  4. Handler notifies the waiting tool via a channel
  5. Tool proceeds or aborts based on decision

File: codex-rs/core/src/error.rs:195

CodexErr::is_retryable() categorizes errors:

  • Non-retryable: TurnAborted, ContextWindowExceeded, UsageLimitReached, InvalidRequest, Sandbox, ServerOverloaded
  • Retryable: Stream, Timeout, UnexpectedStatus, ResponseStreamFailed, ConnectionFailed, InternalServerError, Io, Json
pub(crate) struct TurnContext {
pub sub_id: String,
pub config: Arc<Config>,
pub model_info: ModelInfo,
pub approval_policy: AskForApproval,
pub sandbox_policy: SandboxPolicy,
pub collaboration_mode: CollaborationMode,
pub tools_config: ToolsConfig,
pub final_output_json_schema: Option<Value>,
pub dynamic_tools: Vec<DynamicToolSpec>,
// ...
}

Pinned commit: 7ed449974

OpenCode’s loop is TypeScript async/await built on the Vercel AI SDK. The architecture separates prompt orchestration (SessionPrompt) from stream processing (SessionProcessor).

File: packages/opencode/src/session/prompt.ts:158

Accepts a PromptInput (sessionID, parts, model, agent, format) and:

  1. Creates a user message via createUserMessage() (line 951) — processes file parts, directory listings, MCP resources
  2. Persists via Session.updateMessage() and Session.updatePart()
  3. Unless noReply: true, invokes loop() to start the turn lifecycle

File: packages/opencode/src/session/prompt.ts:274

The loop runs until the assistant finishes or an error occurs:

Step 1 — Load message stream (line 298): Fetches all non-compacted messages via MessageV2.filterCompacted(). Identifies lastUser, lastAssistant, lastFinished.

Exit condition (lines 318-325): If the last assistant message is finished and its finish reason is not "tool-calls" or "unknown", break.

Step 2 — Step tracking (lines 327-334): Increments a step counter. On step 1, starts async title generation via ensureTitle().

Step 3 — Subtask handling (lines 352-526): If a pending subtask part is found (from the Task tool), it executes the subtask tool directly, creates the result message, and continues the loop.

Step 4 — Compaction handling (lines 529-554): If a pending compaction part is found, runs SessionCompaction.process() and continues.

Step 5 — Normal processing (lines 556-714): This is the core agent cycle:

  1. Get agent config: Agent.get(lastUser.agent)
  2. Insert mode reminders (plan/build mode switching)
  3. Create SessionProcessor wrapping an assistant message
  4. Resolve tools via SessionPrompt.resolveTools() (line 602)
  5. Inject StructuredOutput tool if JSON schema format requested
  6. Build system prompt and session messages
  7. Call processor.process() — this streams the LLM response

Step 6 — Loop control (lines 705-714):

  • Result "stop" — break
  • Result "compact" — create compaction and continue
  • Otherwise (tool calls finished) — continue loop

Stream Processing: SessionProcessor.process()

Section titled “Stream Processing: SessionProcessor.process()”

File: packages/opencode/src/session/processor.ts:55

Iterates stream.fullStream (from Vercel AI SDK’s streamText()) and processes events:

Event TypeAction
startSet session status to “busy”
text-deltaAccumulate text, call Session.updatePartDelta()
reasoning-deltaAccumulate reasoning, emit delta
tool-input-startCreate ToolPart with status “pending”
tool-callUpdate status to “running”, check doom loop guard
tool-resultUpdate status to “completed”, record output + timing
tool-errorUpdate status to “error”, check for permission rejection
finish-stepRecord finish reason, token usage, check overflow

File: packages/opencode/src/session/processor.ts:154

If the same tool with the same input is called 3 times consecutively, the processor triggers a doom loop permission check via PermissionNext.ask(). The user can approve (continue) or deny (stop the loop).

OpenCode’s generic permission checks are enforced in the tool execution path, not in the processor’s tool-call handler. SessionPrompt.resolveTools() builds a Tool.Context with ctx.ask() (packages/opencode/src/session/prompt.ts:773), and individual tools (plus the MCP wrapper at line 852) call ctx.ask(...) before execution.

The processor itself uses PermissionNext.ask() at tool-call time only for doom-loop protection (3 repeated identical calls). If a tool-level permission is rejected, the stream emits tool-error, blocked is set, and the loop returns "stop".

When SessionCompaction.isOverflow() detects the token count approaching the context limit (with a 20k reserved buffer), the processor sets needsCompaction = true. On the next loop iteration:

  1. A compaction agent summarizes the conversation
  2. A synthetic “Continue if you have next steps…” message is injected
  3. The loop continues with a compressed context

File: packages/opencode/src/session/retry.ts

Retry decisions are made per-error:

  • ContextOverflowError — not retried (handled by compaction)
  • APIError with isRetryable: false — not retried
  • Rate limits, overloaded errors — retried with exponential backoff (2s initial, 2x factor, 30s cap)
  • Headers retry-after-ms or retry-after are respected when present
// SessionPrompt.state() - per session
{
abort: AbortController, // cancellation
callbacks: Array<{resolve, reject}> // waiting clients
}
// SessionProcessor - per turn
{
toolcalls: Record<string, ToolPart>, // active tool calls
snapshot: string, // file state for undo
blocked: boolean, // permission denied
attempt: number, // retry counter
needsCompaction: boolean, // overflow flag
}

Source: Public documentation at code.claude.com/docs/ (closed source — architecture inferred from docs, not inspected code).

Claude Code is Anthropic’s production coding agent. Unlike the open-source references above, its internals are not available for inspection. What follows is inferred from public documentation and compared against the patterns we’ve already traced.

Claude Code describes its loop as three blended phases: gather context, take action, verify results. Critically, these are NOT discrete states or separate code paths. The documentation is explicit: “These phases blend together. Claude uses tools throughout.” The model dynamically decides what each step requires based on what it learned from the previous step.

The system is described as an “agentic harness” around Claude: it provides tools, context management, and an execution environment. Two core components drive the loop: models (reasoning) and tools (acting). Each tool use returns information that feeds back into the loop, informing the next decision.

This aligns most closely with Codex’s needs_follow_up pattern. There is no indication of Aider-style reflection loops or explicit reflection counters.

The loop exits when:

  1. The model signals completion (no more tool calls)
  2. The user interrupts (type a correction mid-loop and press Enter)
  3. A hard limit is reached (--max-turns or --max-budget-usd in headless mode)
  4. Context fills up — but this triggers compaction, not termination; the loop continues after compaction

No explicit reflection cap is mentioned. Errors are handled through the model observing structured tool results and adjusting, not through a separate reflection mechanism.

Claude Code explicitly supports mid-loop user interruption: “You can interrupt at any point to steer Claude in a different direction, provide additional context, or ask it to try a different approach. Claude will stop what it’s doing and adjust.” This maps to Codex’s steer_input() pattern where user input during an active turn is injected rather than queued.

Claude Code organizes tools into five categories plus orchestration:

CategoryCapabilities
File operationsRead, edit, create, rename, reorganize
SearchFind files by pattern, search content with regex, explore codebases
ExecutionShell commands, servers, tests, git
WebSearch the web, fetch documentation, look up error messages
Code intelligenceType errors/warnings after edits, go to definition, find references
OrchestrationSpawn subagents, ask user questions, task management

Code intelligence is delivered via plugins, not built-in. This is a deliberate extension point — the core loop doesn’t hardcode language-specific analysis.

What loads into the context window per session: conversation history, file contents, command outputs, CLAUDE.md instructions, loaded skills (descriptions only until invoked), and system instructions.

Compaction strategy when approaching the context limit:

  1. Clear older tool outputs first (cheapest, most recoverable)
  2. Summarize conversation if needed (LLM-based compaction)
  3. Preserve user requests and key code snippets
  4. Persistent instructions from early in conversation may be lost (put them in CLAUDE.md instead)

Users control compaction via:

  • A “Compact Instructions” section in CLAUDE.md (survives compaction)
  • /compact command with optional focus: /compact focus on the API changes
  • /context to visualize what’s consuming space

Key insight: Skills load on demand — Claude sees skill descriptions at session start, but full content loads only when invoked. Subagents get completely fresh context windows, separate from the main conversation. They return only a summary. This isolation is the primary scaling mechanism for long sessions.

Each subagent spawned by Claude Code runs in its own fresh context window. It does not inherit the parent conversation’s full history. When the subagent completes, it returns a summary to the parent. This prevents long sessions from degrading — complex subtasks run in clean contexts while the parent conversation stays compact.

This is architecturally distinct from all three open-source references:

  • Aider: Architect mode gives the editor a fresh conversation, but it’s a specific two-model pattern, not a general subagent mechanism
  • Codex: No subagent isolation; all work runs in the same session context
  • OpenCode: Agents run in the same session context with shared message history

On Opus 4.6, Claude Code uses adaptive reasoning: instead of a fixed thinking token budget, the model dynamically allocates thinking based on an effort level setting (low/medium/high). Other models use a fixed budget up to 31,999 tokens.

This means the loop must handle variable-length thinking phases. The MAX_THINKING_TOKENS environment variable can cap the budget (ignored on Opus 4.6 except when set to 0, which disables thinking entirely).

Before every file edit, Claude Code snapshots the current file contents. Users can rewind with Esc+Esc to restore to any previous checkpoint. Checkpoints are local to the session, separate from git. They only cover file changes — remote actions (databases, APIs, deployments) cannot be checkpointed.

This is similar to OpenCode’s snapshot mechanism and Codex’s ghost snapshots.

Three modes, cycled with Shift+Tab:

  1. Default: Asks before file edits and shell commands
  2. Auto-accept edits: Edits without asking, still asks for shell commands
  3. Plan mode: Read-only tools only; creates a plan for user approval before execution

Allowed commands are configurable in .claude/settings.json. Settings scope from organization-wide policies down to personal preferences. In headless mode, --permission-prompt-tool delegates permission decisions to an external MCP tool — enabling CI/CD pipelines where an external system handles approvals.

Key Architectural Differences from Open-Source References

Section titled “Key Architectural Differences from Open-Source References”
AspectClaude CodeCodexOpenCodeAider
Reflection mechanismNone — structured tool results onlyNone — structured tool results onlyDoom loop detection (3x same call)Explicit reflection loop (max 3)
Subagent isolationFresh context per subagentNo subagentsShared session contextArchitect mode only
Mid-turn compactionYes (clear tool outputs, then summarize)Yes (auto-compact and continue)Yes (overflow detection, compaction marker)No (fails with error)
User mid-loop steeringYes (interrupt and redirect)Yes (steer_input)Yes (AbortController)Yes (KeyboardInterrupt)
Thinking budgetAdaptive per effort level (Opus 4.6)FixedFixedFixed
Permission delegationMCP tool for headless modeApproval channelPermission promiseN/A (single-user)

Aider caps reflections at 3. Without this, a malformed response that always fails parsing would loop forever. Codex avoids the problem by not having reflection — tool results are always fed back as structured data, not error strings. OpenCode has doom loop detection (3x same call) but no explicit reflection cap for parse errors since tool results are always structured.

In Codex, when a tool requires approval, the entire turn blocks on the approval channel. If the user is slow to respond, the API connection may time out. Codex mitigates this by holding the connection open, but it creates backpressure in the event loop. OpenCode’s approach is similar but uses async promises.

All three tools handle overflow differently:

  • Aider: Fails with ContextWindowExceededError and tells the user to reduce context
  • Codex: Auto-compacts mid-turn and continues the loop — most resilient
  • OpenCode: Detects overflow at finish-step, creates a compaction marker, and continues

Codex uses tokio::select! with CancellationToken on every spawned task, allowing clean abort mid-tool-execution. Aider uses KeyboardInterrupt (Python signal), which can leave partial state. OpenCode uses AbortController signals passed through the AI SDK.

Aider’s architect mode spawns a fresh editor coder with an empty conversation. If the plan text is ambiguous, the editor has no context to resolve it. Codex avoids this by not supporting multi-model loops. OpenCode has agent-based routing but each agent runs in the same session context.

Aider uses litellm’s token counters (tiktoken for OpenAI, approximations for others). Codex uses a 4-byte heuristic. OpenCode uses char / 4. All three can miscalculate, leading to unexpected context overflow. Only Codex’s mid-turn auto-compact provides a safety net.


OpenOxide adopts Codex’s channel architecture with OpenCode’s event granularity:

// Core agent loop crate
pub struct AgentLoop {
rx_submission: Receiver<Submission>,
tx_event: Sender<Event>,
session: Arc<Session>,
}

The submission loop receives Op variants (user input, approval responses, interrupt, compact) and dispatches to handlers. Each turn spawns a background tokio task with a CancellationToken.

The run_turn() function follows Codex’s pattern:

loop {
let request = session.build_sampling_request().await;
match api_client.stream(request).await {
Ok(stream) => {
let result = process_stream(stream, &tool_runtime).await?;
if !result.needs_follow_up { break; }
if result.token_limit_reached {
run_auto_compact(&session).await?;
}
}
Err(e) if e.is_retryable() => {
backoff_and_retry(&mut retries)?;
continue;
}
Err(e) => { emit_error(e); break; }
}
}

OpenOxide does NOT use Aider’s reflection pattern. Tool results are always fed back as structured FunctionCallOutput items. Parse errors in edit formats are returned as tool error strings, which the model can observe and retry without a separate reflection mechanism.

Follows Codex’s pattern: approval requests are emitted as events, the turn blocks on a oneshot::channel, and the TUI (or MCP client) sends the decision back. A configurable timeout prevents indefinite blocking.

Mid-turn auto-compaction as in Codex, with OpenCode’s two-phase approach:

  1. Prune: Remove old tool outputs beyond a 40k token protect window
  2. Summarize: LLM-based compaction using a dedicated compaction model

Adopt OpenCode’s pattern: track the last 3 tool calls, and if the same tool+input appears 3 times consecutively, inject a warning into the context and optionally halt.

CrateResponsibility
openoxide-loopCore agent loop, submission handling, turn execution
openoxide-sessionSession state, message history, compaction
openoxide-toolsTool registry, dispatch, parallel execution
openoxide-execCommand execution, sandbox integration
openoxide-providerAPI client, streaming, retry logic
  1. No reflection loops. Structured tool results only. Simpler, more predictable. Validated by both Codex and Claude Code — neither uses Aider-style reflection.
  2. Mid-turn compaction. Enables long multi-step tasks without manual intervention. Claude Code confirms this is the production-proven approach: clear tool outputs first, summarize conversation second.
  3. Parallel tool execution. FuturesOrdered with per-tool serialization control.
  4. Event-driven communication. TUI and MCP server are just event consumers — the loop is transport-agnostic.
  5. Cancellation at every await point. tokio::select! with CancellationToken throughout.
  6. User steering mid-turn. Accept user input during an active turn as a redirect, not just as an interrupt-and-restart. Claude Code and Codex both support this via their submission channels. Add a Steer variant to Op that injects into the active turn’s context.
  7. Subagent context isolation. Each subagent gets a fresh context window. Parent sends task description + relevant context; subagent returns a summary. This is Claude Code’s primary mechanism for keeping long sessions manageable. Implement as a spawn_subagent() method on Session that creates a child Session with an independent message history.
  8. Adaptive thinking budget support. The loop must not assume a fixed thinking token allocation. Support a configurable effort level (low/medium/high) that the provider translates into model-specific thinking parameters. Cap with MAX_THINKING_TOKENS env var.
  9. Permission delegation in headless mode. Support an external permission handler (MCP tool or callback) for CI/CD pipelines where no human is available to approve. Mirrors Claude Code’s --permission-prompt-tool pattern.