Reasoning & Thinking Tokens
Source attribution: Implementation details traced from
references/aider/at commitb9050e1d,references/codex/at commit4ab44e2c5, andreferences/opencode/at commit7ed449974.
Feature Definition
Section titled “Feature Definition”Modern reasoning models (OpenAI o-series, Claude with extended thinking, DeepSeek R1, Gemini 3.1) produce two distinct output streams: reasoning tokens (internal chain-of-thought) and visible tokens (the answer the user sees). Coding agents must handle both streams — configuring how much reasoning the model should do, streaming the reasoning content to the user in real-time, accounting for reasoning tokens in budgets and billing, and dealing with provider-specific parameter formats.
The core challenges are:
- Provider fragmentation: Anthropic uses
thinking.budget_tokens, OpenAI usesreasoning_effort+reasoning_summary, Google usesthinkingConfig.thinkingBudget, and Bedrock wraps everything differently. A coding agent targeting multiple providers needs a unified abstraction over all of them. - Token economics: Reasoning tokens count toward output token limits but aren’t part of the visible response. A model with a 16k output limit might spend 12k on reasoning, leaving only 4k for the actual code edit. Budget control matters.
- Streaming complexity: Reasoning content arrives before (or interleaved with) visible content. The agent must detect transition boundaries, display reasoning blocks distinctly from answer blocks, and handle malformed sequences gracefully.
- Temperature interaction: Anthropic’s extended thinking requires
temperature=1.0(no override allowed). Setting a thinking budget must silently disable temperature controls. - Display and storage: Reasoning content is useful for debugging but noisy for chat history. Agents need strategies for showing it during streaming, optionally collapsing it afterward, and deciding whether to persist it.
This page focuses on the token-level mechanics. For model selection and role assignment, see Multi-Model Orchestration. For stream transport and rendering behavior, see Streaming. For context-window partitioning pressure from reasoning output, see Token Budgeting. For the static instruction prefix that carries reasoning policy, see System Prompt.
Aider Implementation
Section titled “Aider Implementation”Reference: references/aider/aider/reasoning_tags.py, aider/models.py, aider/coders/base_coder.py, aider/args.py, aider/commands.py | Commit: b9050e1d
Aider implements a dual-parameter system: --reasoning-effort for OpenAI-style effort levels and --thinking-tokens for Anthropic-style budget tokens. It uses a tag-based extraction system to separate reasoning content from visible output in the response stream.
Two Configuration Axes
Section titled “Two Configuration Axes”The ModelSettings dataclass in aider/models.py (lines 115-139) tracks which parameters each model accepts via an accepts_settings list. Models declare support for "reasoning_effort", "thinking_tokens", or both:
@dataclassclass ModelSettings: name: str reasoning_tag: Optional[str] = None # line 135 remove_reasoning: Optional[str] = None # line 136 (deprecated) accepts_settings: Optional[list] = None # line 138CLI arguments in aider/args.py (lines 139-150):
group.add_argument( "--reasoning-effort", type=str, help="Set the reasoning_effort API parameter (default: not set)",)group.add_argument( "--thinking-tokens", type=str, help="Set the thinking token budget for models that support it. Use 0 to disable.",)Both map to environment variables AIDER_REASONING_EFFORT and AIDER_THINKING_TOKENS via configargparse’s auto_env_var_prefix.
Provider-Specific Parameter Formatting
Section titled “Provider-Specific Parameter Formatting”set_reasoning_effort(effort) in models.py (lines 776-790) writes different wire formats depending on the provider:
- OpenRouter models (
name.startswith("openrouter/")):extra_body.reasoning.effort = effort - All other models:
extra_body.reasoning_effort = effort
set_thinking_tokens(value) in models.py (lines 823-849) is more involved:
- Parses flexible token formats via
parse_token_value()(lines 792-821) — accepts8096,"8k","10.5k","0.5M"with K=1024 and M=1024² multipliers. - Disables temperature:
self.use_temperature = False(line 831). This is critical — Anthropic’s extended thinking rejects requests withtemperature != 1.0. - Writes provider-specific params:
- OpenRouter:
extra_body.reasoning.max_tokens = num_tokens - Standard (Anthropic):
extra_params.thinking = {"type": "enabled", "budget_tokens": num_tokens}
- OpenRouter:
- Setting
0disables thinking by removing the parameter entirely.
Validation Gating in main.py
Section titled “Validation Gating in main.py”aider/main.py (lines 830-864) validates settings against the model’s accepts_settings list before applying them. If --check-model-accepts-settings is enabled (default) and the model doesn’t declare support, the setting is silently ignored with a warning:
if args.reasoning_effort is not None: if not args.check_model_accepts_settings or ( main_model.accepts_settings and "reasoning_effort" in main_model.accepts_settings ): main_model.set_reasoning_effort(args.reasoning_effort)Model Auto-Detection
Section titled “Model Auto-Detection”apply_generic_model_settings() in models.py (lines 373-583) auto-configures reasoning support based on model name patterns:
| Pattern | accepts_settings |
|---|---|
gpt-5.2-codex | ["reasoning_effort"] |
claude-4-6-sonnet*, sonnet-4-6*, opus-4-6*, haiku-4-5* | ["thinking_tokens"] |
openrouter/* | Auto-adds both thinking_tokens and reasoning_effort |
Reasoning Tag Extraction
Section titled “Reasoning Tag Extraction”Models like DeepSeek R1 and Qwen QWQ emit reasoning as inline XML-style tags (<think>...</think>). Aider’s reasoning_tags.py (83 lines) handles extraction and formatting.
Constants (lines 8-11):
REASONING_TAG = "thinking-content-" + "7bbeb8e1441453ad999a0bbba8a46d4b"REASONING_START = "--------------\n► **THINKING**"REASONING_END = "------------\n► **ANSWER**"The hash-suffixed tag name prevents collision with user content containing literal <thinking> tags.
remove_reasoning_content(res, reasoning_tag) (lines 14-40): Strips <tag>...</tag> blocks from the response. Handles malformed sequences where the opening tag is missing but the closing tag exists.
replace_reasoning_tags(text, tag_name) (lines 43-64): Replaces XML tags with formatted display markers (► THINKING / ► ANSWER) for terminal output.
format_reasoning_content(reasoning_content, tag_name) (lines 67-82): Wraps standalone reasoning text (from response.choices[0].message.reasoning_content) in XML tags for uniform processing.
Streaming Pipeline
Section titled “Streaming Pipeline”In base_coder.py, the streaming handler (lines 1900-1975) tracks reasoning state with two flags:
self.got_reasoning_content = Falseself.ended_reasoning_content = FalseStreaming flow:
- Each chunk is checked for
delta.reasoning_contentordelta.reasoning(lines 1927-1933). - First reasoning chunk: emit
<{REASONING_TAG}>\n\nopening marker (line 1937), setgot_reasoning_content = True. - Subsequent reasoning chunks: accumulate text directly.
- First non-reasoning content chunk: emit
</{reasoning_tag_name}>\n\nclosing marker (line 1946), setended_reasoning_content = True. - After streaming completes,
remove_reasoning_content()strips the blocks before edit parsing.
For non-streaming responses (lines 1857-1892), reasoning_content is extracted from completion.choices[0].message.reasoning_content (with fallback to .reasoning), wrapped with format_reasoning_content(), and prepended to the main response.
Interactive Commands
Section titled “Interactive Commands”Two runtime commands allow adjusting reasoning parameters mid-session:
/think-tokens [value](commands.py, lines 1566-1599): Displays or sets the thinking token budget. Shows formatted value like"Current thinking token budget: 8,192 tokens (8k)"./reasoning-effort [level](commands.py, lines 1601-1622): Displays or sets reasoning effort level.
Startup Display
Section titled “Startup Display”base_coder.py (lines 222-230) announces reasoning configuration at startup:
thinking_tokens = main_model.get_thinking_tokens()if thinking_tokens: output += f", {thinking_tokens} think tokens"reasoning_effort = main_model.get_reasoning_effort()if reasoning_effort: output += f", reasoning {reasoning_effort}"Codex Implementation
Section titled “Codex Implementation”Reference: references/codex/codex-rs/protocol/, codex-rs/codex-api/, codex-rs/core/, codex-rs/tui/, codex-rs/app-server-protocol/ | Commit: 4ab44e2c5
Codex implements reasoning as a first-class concept with dedicated enums, per-turn override capability, streaming events for both raw reasoning and summaries, separate token tracking, and a TUI rendering pipeline that extracts bold headers from reasoning blocks.
ReasoningEffort Enum
Section titled “ReasoningEffort Enum”Defined in protocol/src/openai_models.rs (lines 23-49):
#[derive(Debug, Serialize, Deserialize, Default, Clone, Copy, PartialEq, Eq)]#[serde(rename_all = "lowercase")]pub enum ReasoningEffort { None, Minimal, Low, #[default] Medium, High, XHigh,}Six levels: none, minimal, low, medium (default), high, xhigh. This is a superset of OpenAI’s supported levels — the extra granularity (none, minimal, xhigh) is gated by model release date on the server side.
ReasoningSummary Enum
Section titled “ReasoningSummary Enum”Defined in protocol/src/config_types.rs (lines 235-264):
pub enum ReasoningSummary { #[default] Auto, Concise, Detailed, None,}Controls how the API summarizes the model’s internal reasoning. Auto lets the server decide. None disables summaries entirely. This is distinct from reasoning effort — you can have high effort with no summary, or low effort with a detailed summary.
API Request Structure
Section titled “API Request Structure”The Reasoning struct in codex-api/src/common.rs (lines 88-94) bundles both parameters:
#[derive(Debug, Serialize, Clone, PartialEq)]pub struct Reasoning { #[serde(skip_serializing_if = "Option::is_none")] pub effort: Option<ReasoningEffortConfig>, #[serde(skip_serializing_if = "Option::is_none")] pub summary: Option<ReasoningSummaryConfig>,}This gets embedded in ResponsesApiRequest alongside a key detail — the include field:
pub struct ResponsesApiRequest { pub reasoning: Option<Reasoning>, pub include: Vec<String>, // Includes "reasoning.encrypted_content" // ...}When reasoning is enabled, include contains "reasoning.encrypted_content" — this tells the API to return opaque encrypted reasoning content that can be replayed in future requests for context continuity.
Request Building Logic
Section titled “Request Building Logic”core/src/client.rs (lines 460-520) builds the reasoning parameters conditionally:
let default_reasoning_effort = model_info.default_reasoning_level;let reasoning = if model_info.supports_reasoning_summaries { Some(Reasoning { effort: effort.or(default_reasoning_effort), summary: if summary == ReasoningSummaryConfig::None { None } else { Some(summary) }, })} else { None};The logic: only attach reasoning parameters if the model declares supports_reasoning_summaries. Use the per-turn effort override if provided, otherwise fall back to the model’s default_reasoning_level.
Per-Turn Overrides
Section titled “Per-Turn Overrides”Reasoning effort and summary can be overridden on every turn. The TurnStartParams protocol message (from app-server-protocol/src/protocol/v2.rs) includes:
type TurnStartParams = { threadId: string, input: Array<UserInput>, effort?: ReasoningEffort | null, summary?: ReasoningSummary | null, // ...}These flow through TurnContext in core/src/codex.rs (lines 533-567):
pub(crate) struct TurnContext { pub(crate) reasoning_effort: Option<ReasoningEffortConfig>, pub(crate) reasoning_summary: ReasoningSummaryConfig, // ...}And persist at the session level in CodexThread (core/src/codex_thread.rs, lines 18-30):
pub struct CodexThread { pub reasoning_effort: Option<ReasoningEffort>, // ...}Token Tracking
Section titled “Token Tracking”Reasoning tokens are tracked separately in the wire protocol (app-server-protocol/schema/typescript/TokenUsage.ts):
export type TokenUsage = { input_tokens: number, cached_input_tokens: number, output_tokens: number, reasoning_output_tokens: number, total_tokens: number}The reasoning_output_tokens field enables the TUI and telemetry to show reasoning cost independently. In core/src/client.rs (line 1038), these flow to OpenTelemetry:
otel_manager.sse_event_completed( usage.input_tokens, usage.output_tokens, Some(usage.cached_input_tokens), Some(usage.reasoning_output_tokens), usage.total_tokens,);Streaming Events
Section titled “Streaming Events”Codex defines three reasoning-specific streaming events in codex-api/src/common.rs (lines 54-86):
pub enum ResponseEvent { ReasoningContentDelta { delta: String, content_index: i64, }, ReasoningSummaryDelta { delta: String, summary_index: i64, }, ReasoningSummaryPartAdded { summary_index: i64, }, ServerReasoningIncluded(bool), // ...}These map to app-server-protocol notification types:
| Event | Wire Type | Purpose |
|---|---|---|
ReasoningContentDelta | ReasoningTextDeltaNotification | Raw reasoning chunks (may be encrypted or summarized) |
ReasoningSummaryDelta | ReasoningSummaryTextDeltaNotification | Summary text chunks |
ReasoningSummaryPartAdded | ReasoningSummaryPartAddedNotification | Section boundary markers |
ServerReasoningIncluded | — | Flag indicating server pre-accounted reasoning in token budget |
TUI Rendering
Section titled “TUI Rendering”The TUI (tui/src/chatwidget.rs, lines 542-545 and 804-820) accumulates reasoning content in a buffer and extracts display-friendly headers:
reasoning_buffer: String, // Current reasoning blockfull_reasoning_buffer: String, // Full transcript-only reasoningDisplay flow:
on_agent_reasoning_delta(delta): Accumulates text. Does not stream to visible history — reasoning is transcript-only.on_reasoning_section_break(): Starts a new reasoning block. Resets the extraction state.on_agent_reasoning_final(): Records the full reasoning buffer to transcript. Creates anew_reasoning_summary_block()in the history cell.
The summary block parser (tui/src/history_cell.rs) expects **Header**\n\nSummary text format — it extracts the bold header for a collapsed display and shows the summary as expandable bullet points.
Model Presets
Section titled “Model Presets”core/src/models_manager/model_presets.rs defines reasoning effort presets per model:
pub struct ModelPreset { pub default_reasoning_effort: ReasoningEffort, pub supported_reasoning_efforts: Vec<ReasoningEffortPreset>, // ...}
pub struct ReasoningEffortPreset { pub effort: ReasoningEffort, pub description: String, // e.g., "Fast responses with lighter reasoning"}The TUI uses these to populate the effort selector with model-appropriate options.
Configuration Persistence
Section titled “Configuration Persistence”Reasoning effort is stored in the user’s config.toml:
[profile]model_reasoning_effort = "high"Programmatic updates use the ConfigEdit::SetModelReasoningEffort(Option<ReasoningEffort>) variant in core/src/config/edit.rs.
OpenCode Implementation
Section titled “OpenCode Implementation”Reference: references/opencode/packages/opencode/src/provider/, src/session/, src/config/ | Commit: 7ed449974
OpenCode has the most complex reasoning implementation because it targets the widest range of providers through the Vercel AI SDK. Each provider has a different parameter schema, and OpenCode builds a variants() abstraction that maps unified effort levels to provider-specific wire formats.
Model Capability Declaration
Section titled “Model Capability Declaration”In provider/models.ts (lines 17-70), models declare reasoning support:
export const Model = z.object({ reasoning: z.boolean(), // line 23 interleaved: z.union([ z.literal(true), z.object({ field: z.enum(["reasoning_content", "reasoning_details"]), }), ]).optional(),})The interleaved field is significant: it indicates the model supports reasoning interleaved with text output (not just reasoning-first-then-answer). The field discriminator tells the normalization layer which JSON field the provider uses.
Provider-Specific Variant Maps
Section titled “Provider-Specific Variant Maps”The variants() function in provider/transform.ts (lines 329-658) is the heart of the reasoning system. It returns a Record<string, providerOptions> mapping effort level names to provider-specific parameter objects.
Anthropic (lines 500-533):
// Opus 4.6: adaptive thinking with effort levelsif (model.api.id.includes("opus-4-6")) { return Object.fromEntries( ["low", "medium", "high", "max"].map((effort) => [ effort, { thinking: { type: "adaptive" }, effort }, ]) )}// Other Claude models: fixed budget thinkingreturn { high: { thinking: { type: "enabled", budgetTokens: Math.min(16_000, Math.floor(model.limit.output / 2 - 1)), }, }, max: { thinking: { type: "enabled", budgetTokens: Math.min(31_999, model.limit.output - 1), }, },}Two modes: Claude Opus 4.6 uses type: "adaptive" with an effort parameter (the server decides the budget). Other Claude models use type: "enabled" with an explicit budgetTokens value — capped at half the output limit for high and output_limit - 1 for max.
OpenAI (lines 469-498):
return Object.fromEntries( openaiEfforts.map((effort) => [ effort, { reasoningEffort: effort, reasoningSummary: "auto", include: ["reasoning.encrypted_content"], }, ]))Effort levels vary by model and release date: codex-5 gets "minimal", post-2025-11-13 models get "none", post-2025-12-04 models get "xhigh", Codex 5.2/5.3 models get "xhigh".
Google Gemini (lines 582-610):
// Gemini 3.1: explicit thinking budgetreturn { high: { thinkingConfig: { includeThoughts: true, thinkingBudget: 16000 } }, max: { thinkingConfig: { includeThoughts: true, thinkingBudget: 24576 } },}// Older Gemini: level-basedreturn Object.fromEntries( ["low", "high"].map((effort) => [ effort, { includeThoughts: true, thinkingLevel: effort }, ]))Amazon Bedrock (lines 535-580) wraps Anthropic models with a different schema: reasoningConfig with type: "adaptive" or type: "enabled" plus budgetTokens or maxReasoningEffort.
Default Reasoning Configuration
Section titled “Default Reasoning Configuration”The options() function in provider/transform.ts (lines 660-776) sets default reasoning parameters that apply even without user override:
- Codex 5.2/3 models:
reasoningEffort: "medium",reasoningSummary: "auto"by default. - Google Gemini 3.1:
thinkingConfig: { includeThoughts: true, thinkingLevel: "high" }. - Kimi K2.5 on Anthropic:
thinking: { type: "enabled", budgetTokens: min(16000, output_limit/2 - 1) }. - Alibaba reasoning models:
enable_thinking: true.
Streaming Event Processing
Section titled “Streaming Event Processing”session/processor.ts (lines 62-109) handles reasoning streaming with three event types:
case "reasoning-start": const reasoningPart = { id: Identifier.ascending("part"), type: "reasoning" as const, text: "", time: { start: Date.now() }, metadata: value.providerMetadata, } await Session.updatePart(reasoningPart) break
case "reasoning-delta": part.text += value.text await Session.updatePartDelta({ partID: part.id, field: "text", delta: value.text, }) break
case "reasoning-end": part.text = part.text.trimEnd() part.time = { ...part.time, end: Date.now() } await Session.updatePart(part) breakEach reasoning block becomes a ReasoningPart in the message part list, tracked with start/end timestamps and provider-specific metadata. The delta path uses Session.updatePartDelta() for efficient incremental updates to the SQLite store.
ReasoningPart Storage Schema
Section titled “ReasoningPart Storage Schema”Defined in session/message-v2.ts (lines 116-127):
export const ReasoningPart = PartBase.extend({ type: z.literal("reasoning"), text: z.string(), metadata: z.record(z.string(), z.any()).optional(), time: z.object({ start: z.number(), end: z.number().optional(), }),})Reasoning tokens are tracked separately in StepFinishPart (line 250):
tokens: z.object({ input: z.number(), output: z.number(), reasoning: z.number(), cache: z.object({ read: z.number(), write: z.number() }),})Interleaved Reasoning Normalization
Section titled “Interleaved Reasoning Normalization”For models declaring interleaved: { field: "reasoning_content" | "reasoning_details" }, the message normalization in provider/transform.ts (lines 136-169) converts stored reasoning parts back to provider-specific format when replaying conversation history:
if (typeof model.capabilities.interleaved === "object") { const field = model.capabilities.interleaved.field return msgs.map((msg) => { if (msg.role === "assistant" && Array.isArray(msg.content)) { const reasoningParts = msg.content.filter(p => p.type === "reasoning") const reasoningText = reasoningParts.map(p => p.text).join("") const filteredContent = msg.content.filter(p => p.type !== "reasoning") if (reasoningText) { return { ...msg, content: filteredContent, providerOptions: { openaiCompatible: { [field]: reasoningText }, }, } } } return msg })}This ensures that when replaying a conversation to a model that expects reasoning_content on the message object (rather than as a separate content block), the stored reasoning parts are correctly remapped.
Anthropic Beta Headers
Section titled “Anthropic Beta Headers”provider/provider.ts (lines 117-127) enables extended thinking via beta headers:
async anthropic() { return { options: { headers: { "anthropic-beta": "claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14", }, }, }}The interleaved-thinking-2025-05-14 beta enables thinking blocks interleaved with tool calls and text — without it, thinking only appears at the start of the response.
Copilot Provider Specifics
Section titled “Copilot Provider Specifics”The Copilot provider (provider/sdk/copilot/chat/openai-compatible-chat-language-model.ts, lines 450-503) handles reasoning via the delta.reasoning_text field:
const reasoningContent = delta.reasoning_textif (reasoningContent) { if (!isActiveReasoning) { controller.enqueue({ type: "reasoning-start", id: "reasoning-0" }) isActiveReasoning = true } controller.enqueue({ type: "reasoning-delta", id: "reasoning-0", delta: reasoningContent, })}When main content starts arriving and reasoning was active, it closes the reasoning block and transitions to text streaming.
Pitfalls & Hard Lessons
Section titled “Pitfalls & Hard Lessons”Temperature Conflicts
Section titled “Temperature Conflicts”Anthropic’s extended thinking API rejects requests with any temperature value other than 1.0. Aider handles this by setting use_temperature = False when thinking tokens are enabled. Codex avoids the issue by using the Responses API (which doesn’t accept temperature). OpenCode relies on the AI SDK to handle it. If you don’t handle this, enabling thinking tokens will crash every request.
Budget Arithmetic
Section titled “Budget Arithmetic”With Anthropic’s budget_tokens, the value must be strictly less than max_output_tokens. OpenCode uses output_limit - 1 for the max variant and output_limit / 2 - 1 for high. Off-by-one here means a 400 error from the API.
Encrypted Reasoning Content
Section titled “Encrypted Reasoning Content”OpenAI’s reasoning.encrypted_content is opaque — it can’t be read or displayed. It’s returned so it can be included in subsequent requests for context continuity. Codex includes it via the include field in the request. If you forget this, each turn starts without reasoning context from previous turns.
Tag-Based Reasoning Detection
Section titled “Tag-Based Reasoning Detection”DeepSeek R1 and Qwen QWQ models emit reasoning as literal <think>...</think> tags in the response text. This requires tag-based extraction rather than field-based extraction. Aider uses a hash-suffixed wrapper tag (thinking-content-7bbeb8e1...) to avoid collision with user content that might contain <thinking> literally. The tag approach is fragile — malformed responses (missing opening tag, partial closing tag) need explicit handling.
Interleaved vs Sequential Reasoning
Section titled “Interleaved vs Sequential Reasoning”Some models (with Anthropic’s interleaved-thinking beta) produce reasoning blocks between tool calls and text blocks. Others produce reasoning only at the start. The conversation replay logic must handle both: sequential reasoning gets prepended, interleaved reasoning gets embedded at its original position. Getting this wrong corrupts the model’s context for subsequent turns.
Summary vs Content
Section titled “Summary vs Content”Codex distinguishes ReasoningContentDelta (the full reasoning text) from ReasoningSummaryDelta (a shorter summary). The TUI shows summaries in the transcript but records full content separately. OpenCode stores the full reasoning text as a ReasoningPart. Aider strips reasoning entirely after display. Each choice has tradeoffs for token budget on replayed conversations.
OpenOxide Blueprint
Section titled “OpenOxide Blueprint”Unified Reasoning Configuration
Section titled “Unified Reasoning Configuration”Define a ReasoningConfig enum in a protocol crate:
pub enum ReasoningMode { /// No reasoning tokens requested. Off, /// Provider-determined budget. Adaptive { effort: ReasoningEffort }, /// Explicit token budget. Budget { tokens: u32 },}
pub enum ReasoningEffort { None, Minimal, Low, Medium, High, Max,}
pub enum ReasoningSummary { Auto, Concise, Detailed, Off,}The ReasoningMode enum captures both Anthropic-style budgets and OpenAI-style effort levels in a single type. Provider adapters translate this to wire format.
Provider Adapter Trait
Section titled “Provider Adapter Trait”Extend the provider trait with reasoning parameter generation:
trait ProviderAdapter { fn reasoning_params( &self, mode: &ReasoningMode, summary: &ReasoningSummary, model: &ModelInfo, ) -> serde_json::Value;}Each provider (Anthropic, OpenAI, Google, Bedrock) implements this with its own wire format. Budget arithmetic (the output_limit - 1 cap) lives inside the adapter, not in generic code.
Streaming Event Types
Section titled “Streaming Event Types”pub enum ReasoningEvent { Start { block_id: u32 }, Delta { block_id: u32, text: String }, End { block_id: u32 }, SummaryDelta { block_id: u32, summary_index: u32, text: String }, SummaryPartAdded { block_id: u32, summary_index: u32 },}The block_id supports interleaved reasoning — multiple reasoning blocks in a single response, separated by tool calls or text.
Token Tracking
Section titled “Token Tracking”pub struct TokenUsage { pub input: u32, pub cached_input: u32, pub output: u32, pub reasoning_output: u32, pub total: u32,}Track reasoning_output separately. Display it in the TUI status bar alongside regular output tokens. Include it in OpenTelemetry spans.
Tag Extraction
Section titled “Tag Extraction”For models that embed reasoning in tags (DeepSeek R1, Qwen QWQ), implement a TagExtractor that processes the response stream:
struct TagExtractor { tag: String, state: TagState, // Outside | InsideTag | InsideContent buffer: String,}
impl TagExtractor { fn process_chunk(&mut self, chunk: &str) -> Vec<ReasoningEvent>;}Use a state machine rather than regex for streaming — regex requires the full text, but we process chunks incrementally.
Crates
Section titled “Crates”openoxide-protocol:ReasoningMode,ReasoningEffort,ReasoningSummary,ReasoningEvent,TokenUsagetypes.openoxide-provider:ProviderAdaptertrait implementations per provider, including reasoning parameter generation and tag extraction.openoxide-core: Per-turn reasoning override logic, token tracking aggregation, reasoning content storage decisions.openoxide-tui: Reasoning block rendering (collapsed/expanded), summary extraction, token display.
Key Design Decision: Store or Discard?
Section titled “Key Design Decision: Store or Discard?”Follow Codex’s approach: store full reasoning content in the session log (for debugging and context replay) but display only summaries in the TUI transcript. This gives the best of both worlds — reasoning context is available for subsequent turns (via encrypted content or full text replay), while the UI stays clean.
For providers that support encrypted reasoning content (OpenAI), include it in the include parameter and store the opaque blob. For providers that return plaintext reasoning (Anthropic extended thinking, DeepSeek R1), store the full text but render it collapsed in the TUI with the first bold line as the header (matching Codex’s pattern).