Reasoning & Thinking Tokens

Source attribution: Implementation details traced from references/aider/ at commit b9050e1d, references/codex/ at commit 4ab44e2c5, and references/opencode/ at commit 7ed449974.

Feature Definition

Modern reasoning models (OpenAI o-series, Claude with extended thinking, DeepSeek R1, Gemini 3.1) produce two distinct output streams: reasoning tokens (internal chain-of-thought) and visible tokens (the answer the user sees). Coding agents must handle both streams — configuring how much reasoning the model should do, streaming the reasoning content to the user in real-time, accounting for reasoning tokens in budgets and billing, and dealing with provider-specific parameter formats.

The core challenges are:

Provider fragmentation: Anthropic uses thinking.budget_tokens, OpenAI uses reasoning_effort + reasoning_summary, Google uses thinkingConfig.thinkingBudget, and Bedrock wraps everything differently. A coding agent targeting multiple providers needs a unified abstraction over all of them.
Token economics: Reasoning tokens count toward output token limits but aren’t part of the visible response. A model with a 16k output limit might spend 12k on reasoning, leaving only 4k for the actual code edit. Budget control matters.
Streaming complexity: Reasoning content arrives before (or interleaved with) visible content. The agent must detect transition boundaries, display reasoning blocks distinctly from answer blocks, and handle malformed sequences gracefully.
Temperature interaction: Anthropic’s extended thinking requires temperature=1.0 (no override allowed). Setting a thinking budget must silently disable temperature controls.
Display and storage: Reasoning content is useful for debugging but noisy for chat history. Agents need strategies for showing it during streaming, optionally collapsing it afterward, and deciding whether to persist it.

This page focuses on the token-level mechanics. For model selection and role assignment, see Multi-Model Orchestration. For stream transport and rendering behavior, see Streaming. For context-window partitioning pressure from reasoning output, see Token Budgeting. For the static instruction prefix that carries reasoning policy, see System Prompt.

Aider Implementation

Reference: references/aider/aider/reasoning_tags.py, aider/models.py, aider/coders/base_coder.py, aider/args.py, aider/commands.py | Commit: b9050e1d

Aider implements a dual-parameter system: --reasoning-effort for OpenAI-style effort levels and --thinking-tokens for Anthropic-style budget tokens. It uses a tag-based extraction system to separate reasoning content from visible output in the response stream.

Two Configuration Axes

The ModelSettings dataclass in aider/models.py (lines 115-139) tracks which parameters each model accepts via an accepts_settings list. Models declare support for "reasoning_effort", "thinking_tokens", or both:

@dataclass
class ModelSettings:
    name: str
    reasoning_tag: Optional[str] = None      # line 135
    remove_reasoning: Optional[str] = None   # line 136 (deprecated)
    accepts_settings: Optional[list] = None  # line 138

CLI arguments in aider/args.py (lines 139-150):

group.add_argument(
    "--reasoning-effort",
    type=str,
    help="Set the reasoning_effort API parameter (default: not set)",
)
group.add_argument(
    "--thinking-tokens",
    type=str,
    help="Set the thinking token budget for models that support it. Use 0 to disable.",
)

Both map to environment variables AIDER_REASONING_EFFORT and AIDER_THINKING_TOKENS via configargparse’s auto_env_var_prefix.

Provider-Specific Parameter Formatting

set_reasoning_effort(effort) in models.py (lines 776-790) writes different wire formats depending on the provider:

OpenRouter models (name.startswith("openrouter/")): extra_body.reasoning.effort = effort
All other models: extra_body.reasoning_effort = effort

set_thinking_tokens(value) in models.py (lines 823-849) is more involved:

Parses flexible token formats via parse_token_value() (lines 792-821) — accepts 8096, "8k", "10.5k", "0.5M" with K=1024 and M=1024² multipliers.
Disables temperature: self.use_temperature = False (line 831). This is critical — Anthropic’s extended thinking rejects requests with temperature != 1.0.
Writes provider-specific params:
- OpenRouter: extra_body.reasoning.max_tokens = num_tokens
- Standard (Anthropic): extra_params.thinking = {"type": "enabled", "budget_tokens": num_tokens}
Setting 0 disables thinking by removing the parameter entirely.

Validation Gating in main.py

aider/main.py (lines 830-864) validates settings against the model’s accepts_settings list before applying them. If --check-model-accepts-settings is enabled (default) and the model doesn’t declare support, the setting is silently ignored with a warning:

if args.reasoning_effort is not None:
    if not args.check_model_accepts_settings or (
        main_model.accepts_settings and "reasoning_effort" in main_model.accepts_settings
    ):
        main_model.set_reasoning_effort(args.reasoning_effort)

Model Auto-Detection

apply_generic_model_settings() in models.py (lines 373-583) auto-configures reasoning support based on model name patterns:

Pattern	accepts_settings
`gpt-5.2-codex`	`["reasoning_effort"]`
`claude-4-6-sonnet`, `sonnet-4-6`, `opus-4-6`, `haiku-4-5`	`["thinking_tokens"]`
`openrouter/*`	Auto-adds both `thinking_tokens` and `reasoning_effort`

Reasoning Tag Extraction

Models like DeepSeek R1 and Qwen QWQ emit reasoning as inline XML-style tags (<think>...</think>). Aider’s reasoning_tags.py (83 lines) handles extraction and formatting.

Constants (lines 8-11):

REASONING_TAG = "thinking-content-" + "7bbeb8e1441453ad999a0bbba8a46d4b"
REASONING_START = "--------------\n► **THINKING**"
REASONING_END = "------------\n► **ANSWER**"

The hash-suffixed tag name prevents collision with user content containing literal <thinking> tags.

remove_reasoning_content(res, reasoning_tag) (lines 14-40): Strips <tag>...</tag> blocks from the response. Handles malformed sequences where the opening tag is missing but the closing tag exists.

replace_reasoning_tags(text, tag_name) (lines 43-64): Replaces XML tags with formatted display markers (► THINKING / ► ANSWER) for terminal output.

format_reasoning_content(reasoning_content, tag_name) (lines 67-82): Wraps standalone reasoning text (from response.choices[0].message.reasoning_content) in XML tags for uniform processing.

Streaming Pipeline

In base_coder.py, the streaming handler (lines 1900-1975) tracks reasoning state with two flags:

self.got_reasoning_content = False
self.ended_reasoning_content = False

Streaming flow:

Each chunk is checked for delta.reasoning_content or delta.reasoning (lines 1927-1933).
First reasoning chunk: emit <{REASONING_TAG}>\n\n opening marker (line 1937), set got_reasoning_content = True.
Subsequent reasoning chunks: accumulate text directly.
First non-reasoning content chunk: emit </{reasoning_tag_name}>\n\n closing marker (line 1946), set ended_reasoning_content = True.
After streaming completes, remove_reasoning_content() strips the blocks before edit parsing.

For non-streaming responses (lines 1857-1892), reasoning_content is extracted from completion.choices[0].message.reasoning_content (with fallback to .reasoning), wrapped with format_reasoning_content(), and prepended to the main response.

Interactive Commands

Two runtime commands allow adjusting reasoning parameters mid-session:

/think-tokens [value] (commands.py, lines 1566-1599): Displays or sets the thinking token budget. Shows formatted value like "Current thinking token budget: 8,192 tokens (8k)".
/reasoning-effort [level] (commands.py, lines 1601-1622): Displays or sets reasoning effort level.

Startup Display

base_coder.py (lines 222-230) announces reasoning configuration at startup:

thinking_tokens = main_model.get_thinking_tokens()
if thinking_tokens:
    output += f", {thinking_tokens} think tokens"
reasoning_effort = main_model.get_reasoning_effort()
if reasoning_effort:
    output += f", reasoning {reasoning_effort}"

Codex Implementation

Reference: references/codex/codex-rs/protocol/, codex-rs/codex-api/, codex-rs/core/, codex-rs/tui/, codex-rs/app-server-protocol/ | Commit: 4ab44e2c5

Codex implements reasoning as a first-class concept with dedicated enums, per-turn override capability, streaming events for both raw reasoning and summaries, separate token tracking, and a TUI rendering pipeline that extracts bold headers from reasoning blocks.

ReasoningEffort Enum

Defined in protocol/src/openai_models.rs (lines 23-49):

#[derive(Debug, Serialize, Deserialize, Default, Clone, Copy, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum ReasoningEffort {
    None,
    Minimal,
    Low,
    #[default]
    Medium,
    High,
    XHigh,
}

Six levels: none, minimal, low, medium (default), high, xhigh. This is a superset of OpenAI’s supported levels — the extra granularity (none, minimal, xhigh) is gated by model release date on the server side.

ReasoningSummary Enum

Defined in protocol/src/config_types.rs (lines 235-264):

pub enum ReasoningSummary {
    #[default]
    Auto,
    Concise,
    Detailed,
    None,
}

Controls how the API summarizes the model’s internal reasoning. Auto lets the server decide. None disables summaries entirely. This is distinct from reasoning effort — you can have high effort with no summary, or low effort with a detailed summary.

API Request Structure

The Reasoning struct in codex-api/src/common.rs (lines 88-94) bundles both parameters:

#[derive(Debug, Serialize, Clone, PartialEq)]
pub struct Reasoning {
    #[serde(skip_serializing_if = "Option::is_none")]
    pub effort: Option<ReasoningEffortConfig>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub summary: Option<ReasoningSummaryConfig>,
}

This gets embedded in ResponsesApiRequest alongside a key detail — the include field:

pub struct ResponsesApiRequest {
    pub reasoning: Option<Reasoning>,
    pub include: Vec<String>,  // Includes "reasoning.encrypted_content"
    // ...
}

When reasoning is enabled, include contains "reasoning.encrypted_content" — this tells the API to return opaque encrypted reasoning content that can be replayed in future requests for context continuity.

Request Building Logic

core/src/client.rs (lines 460-520) builds the reasoning parameters conditionally:

let default_reasoning_effort = model_info.default_reasoning_level;
let reasoning = if model_info.supports_reasoning_summaries {
    Some(Reasoning {
        effort: effort.or(default_reasoning_effort),
        summary: if summary == ReasoningSummaryConfig::None {
            None
        } else {
            Some(summary)
        },
    })
} else {
    None
};

The logic: only attach reasoning parameters if the model declares supports_reasoning_summaries. Use the per-turn effort override if provided, otherwise fall back to the model’s default_reasoning_level.

Per-Turn Overrides

Reasoning effort and summary can be overridden on every turn. The TurnStartParams protocol message (from app-server-protocol/src/protocol/v2.rs) includes:

type TurnStartParams = {
    threadId: string,
    input: Array<UserInput>,
    effort?: ReasoningEffort | null,
    summary?: ReasoningSummary | null,
    // ...
}

These flow through TurnContext in core/src/codex.rs (lines 533-567):

pub(crate) struct TurnContext {
    pub(crate) reasoning_effort: Option<ReasoningEffortConfig>,
    pub(crate) reasoning_summary: ReasoningSummaryConfig,
    // ...
}

And persist at the session level in CodexThread (core/src/codex_thread.rs, lines 18-30):

pub struct CodexThread {
    pub reasoning_effort: Option<ReasoningEffort>,
    // ...
}

Token Tracking

Reasoning tokens are tracked separately in the wire protocol (app-server-protocol/schema/typescript/TokenUsage.ts):

export type TokenUsage = {
    input_tokens: number,
    cached_input_tokens: number,
    output_tokens: number,
    reasoning_output_tokens: number,
    total_tokens: number
}

The reasoning_output_tokens field enables the TUI and telemetry to show reasoning cost independently. In core/src/client.rs (line 1038), these flow to OpenTelemetry:

otel_manager.sse_event_completed(
    usage.input_tokens,
    usage.output_tokens,
    Some(usage.cached_input_tokens),
    Some(usage.reasoning_output_tokens),
    usage.total_tokens,
);

Streaming Events

Codex defines three reasoning-specific streaming events in codex-api/src/common.rs (lines 54-86):

pub enum ResponseEvent {
    ReasoningContentDelta {
        delta: String,
        content_index: i64,
    },
    ReasoningSummaryDelta {
        delta: String,
        summary_index: i64,
    },
    ReasoningSummaryPartAdded {
        summary_index: i64,
    },
    ServerReasoningIncluded(bool),
    // ...
}

These map to app-server-protocol notification types:

Event	Wire Type	Purpose
`ReasoningContentDelta`	`ReasoningTextDeltaNotification`	Raw reasoning chunks (may be encrypted or summarized)
`ReasoningSummaryDelta`	`ReasoningSummaryTextDeltaNotification`	Summary text chunks
`ReasoningSummaryPartAdded`	`ReasoningSummaryPartAddedNotification`	Section boundary markers
`ServerReasoningIncluded`	—	Flag indicating server pre-accounted reasoning in token budget

TUI Rendering

The TUI (tui/src/chatwidget.rs, lines 542-545 and 804-820) accumulates reasoning content in a buffer and extracts display-friendly headers:

reasoning_buffer: String,         // Current reasoning block
full_reasoning_buffer: String,    // Full transcript-only reasoning

Display flow:

on_agent_reasoning_delta(delta): Accumulates text. Does not stream to visible history — reasoning is transcript-only.
on_reasoning_section_break(): Starts a new reasoning block. Resets the extraction state.
on_agent_reasoning_final(): Records the full reasoning buffer to transcript. Creates a new_reasoning_summary_block() in the history cell.

The summary block parser (tui/src/history_cell.rs) expects **Header**\n\nSummary text format — it extracts the bold header for a collapsed display and shows the summary as expandable bullet points.

Model Presets

core/src/models_manager/model_presets.rs defines reasoning effort presets per model:

pub struct ModelPreset {
    pub default_reasoning_effort: ReasoningEffort,
    pub supported_reasoning_efforts: Vec<ReasoningEffortPreset>,
    // ...
}

pub struct ReasoningEffortPreset {
    pub effort: ReasoningEffort,
    pub description: String,  // e.g., "Fast responses with lighter reasoning"
}

The TUI uses these to populate the effort selector with model-appropriate options.

Configuration Persistence

Reasoning effort is stored in the user’s config.toml:

[profile]
model_reasoning_effort = "high"

Programmatic updates use the ConfigEdit::SetModelReasoningEffort(Option<ReasoningEffort>) variant in core/src/config/edit.rs.

OpenCode Implementation

Reference: references/opencode/packages/opencode/src/provider/, src/session/, src/config/ | Commit: 7ed449974

OpenCode has the most complex reasoning implementation because it targets the widest range of providers through the Vercel AI SDK. Each provider has a different parameter schema, and OpenCode builds a variants() abstraction that maps unified effort levels to provider-specific wire formats.

Model Capability Declaration

In provider/models.ts (lines 17-70), models declare reasoning support:

export const Model = z.object({
  reasoning: z.boolean(),        // line 23
  interleaved: z.union([
    z.literal(true),
    z.object({
      field: z.enum(["reasoning_content", "reasoning_details"]),
    }),
  ]).optional(),
})

The interleaved field is significant: it indicates the model supports reasoning interleaved with text output (not just reasoning-first-then-answer). The field discriminator tells the normalization layer which JSON field the provider uses.

Provider-Specific Variant Maps

The variants() function in provider/transform.ts (lines 329-658) is the heart of the reasoning system. It returns a Record<string, providerOptions> mapping effort level names to provider-specific parameter objects.

Anthropic (lines 500-533):

// Opus 4.6: adaptive thinking with effort levels
if (model.api.id.includes("opus-4-6")) {
    return Object.fromEntries(
        ["low", "medium", "high", "max"].map((effort) => [
            effort,
            { thinking: { type: "adaptive" }, effort },
        ])
    )
}
// Other Claude models: fixed budget thinking
return {
    high: {
        thinking: {
            type: "enabled",
            budgetTokens: Math.min(16_000, Math.floor(model.limit.output / 2 - 1)),
        },
    },
    max: {
        thinking: {
            type: "enabled",
            budgetTokens: Math.min(31_999, model.limit.output - 1),
        },
    },
}

Two modes: Claude Opus 4.6 uses type: "adaptive" with an effort parameter (the server decides the budget). Other Claude models use type: "enabled" with an explicit budgetTokens value — capped at half the output limit for high and output_limit - 1 for max.

OpenAI (lines 469-498):

return Object.fromEntries(
    openaiEfforts.map((effort) => [
        effort,
        {
            reasoningEffort: effort,
            reasoningSummary: "auto",
            include: ["reasoning.encrypted_content"],
        },
    ])
)

Effort levels vary by model and release date: codex-5 gets "minimal", post-2025-11-13 models get "none", post-2025-12-04 models get "xhigh", Codex 5.2/5.3 models get "xhigh".

Google Gemini (lines 582-610):

// Gemini 3.1: explicit thinking budget
return {
    high: { thinkingConfig: { includeThoughts: true, thinkingBudget: 16000 } },
    max: { thinkingConfig: { includeThoughts: true, thinkingBudget: 24576 } },
}
// Older Gemini: level-based
return Object.fromEntries(
    ["low", "high"].map((effort) => [
        effort,
        { includeThoughts: true, thinkingLevel: effort },
    ])
)

Amazon Bedrock (lines 535-580) wraps Anthropic models with a different schema: reasoningConfig with type: "adaptive" or type: "enabled" plus budgetTokens or maxReasoningEffort.

Default Reasoning Configuration

The options() function in provider/transform.ts (lines 660-776) sets default reasoning parameters that apply even without user override:

Codex 5.2/3 models: reasoningEffort: "medium", reasoningSummary: "auto" by default.
Google Gemini 3.1: thinkingConfig: { includeThoughts: true, thinkingLevel: "high" }.
Kimi K2.5 on Anthropic: thinking: { type: "enabled", budgetTokens: min(16000, output_limit/2 - 1) }.
Alibaba reasoning models: enable_thinking: true.

Streaming Event Processing

session/processor.ts (lines 62-109) handles reasoning streaming with three event types:

case "reasoning-start":
    const reasoningPart = {
        id: Identifier.ascending("part"),
        type: "reasoning" as const,
        text: "",
        time: { start: Date.now() },
        metadata: value.providerMetadata,
    }
    await Session.updatePart(reasoningPart)
    break

case "reasoning-delta":
    part.text += value.text
    await Session.updatePartDelta({
        partID: part.id,
        field: "text",
        delta: value.text,
    })
    break

case "reasoning-end":
    part.text = part.text.trimEnd()
    part.time = { ...part.time, end: Date.now() }
    await Session.updatePart(part)
    break

Each reasoning block becomes a ReasoningPart in the message part list, tracked with start/end timestamps and provider-specific metadata. The delta path uses Session.updatePartDelta() for efficient incremental updates to the SQLite store.

ReasoningPart Storage Schema

Defined in session/message-v2.ts (lines 116-127):

export const ReasoningPart = PartBase.extend({
    type: z.literal("reasoning"),
    text: z.string(),
    metadata: z.record(z.string(), z.any()).optional(),
    time: z.object({
        start: z.number(),
        end: z.number().optional(),
    }),
})

Reasoning tokens are tracked separately in StepFinishPart (line 250):

tokens: z.object({
    input: z.number(),
    output: z.number(),
    reasoning: z.number(),
    cache: z.object({ read: z.number(), write: z.number() }),
})

Interleaved Reasoning Normalization

For models declaring interleaved: { field: "reasoning_content" | "reasoning_details" }, the message normalization in provider/transform.ts (lines 136-169) converts stored reasoning parts back to provider-specific format when replaying conversation history:

if (typeof model.capabilities.interleaved === "object") {
    const field = model.capabilities.interleaved.field
    return msgs.map((msg) => {
        if (msg.role === "assistant" && Array.isArray(msg.content)) {
            const reasoningParts = msg.content.filter(p => p.type === "reasoning")
            const reasoningText = reasoningParts.map(p => p.text).join("")
            const filteredContent = msg.content.filter(p => p.type !== "reasoning")
            if (reasoningText) {
                return {
                    ...msg,
                    content: filteredContent,
                    providerOptions: {
                        openaiCompatible: { [field]: reasoningText },
                    },
                }
            }
        }
        return msg
    })
}

This ensures that when replaying a conversation to a model that expects reasoning_content on the message object (rather than as a separate content block), the stored reasoning parts are correctly remapped.

Anthropic Beta Headers

provider/provider.ts (lines 117-127) enables extended thinking via beta headers:

async anthropic() {
    return {
        options: {
            headers: {
                "anthropic-beta":
                    "claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14",
            },
        },
    }
}

The interleaved-thinking-2025-05-14 beta enables thinking blocks interleaved with tool calls and text — without it, thinking only appears at the start of the response.

Copilot Provider Specifics

The Copilot provider (provider/sdk/copilot/chat/openai-compatible-chat-language-model.ts, lines 450-503) handles reasoning via the delta.reasoning_text field:

const reasoningContent = delta.reasoning_text
if (reasoningContent) {
    if (!isActiveReasoning) {
        controller.enqueue({ type: "reasoning-start", id: "reasoning-0" })
        isActiveReasoning = true
    }
    controller.enqueue({
        type: "reasoning-delta",
        id: "reasoning-0",
        delta: reasoningContent,
    })
}

When main content starts arriving and reasoning was active, it closes the reasoning block and transitions to text streaming.

Pitfalls & Hard Lessons

Temperature Conflicts

Anthropic’s extended thinking API rejects requests with any temperature value other than 1.0. Aider handles this by setting use_temperature = False when thinking tokens are enabled. Codex avoids the issue by using the Responses API (which doesn’t accept temperature). OpenCode relies on the AI SDK to handle it. If you don’t handle this, enabling thinking tokens will crash every request.

Budget Arithmetic

With Anthropic’s budget_tokens, the value must be strictly less than max_output_tokens. OpenCode uses output_limit - 1 for the max variant and output_limit / 2 - 1 for high. Off-by-one here means a 400 error from the API.

Encrypted Reasoning Content

OpenAI’s reasoning.encrypted_content is opaque — it can’t be read or displayed. It’s returned so it can be included in subsequent requests for context continuity. Codex includes it via the include field in the request. If you forget this, each turn starts without reasoning context from previous turns.

Tag-Based Reasoning Detection

DeepSeek R1 and Qwen QWQ models emit reasoning as literal <think>...</think> tags in the response text. This requires tag-based extraction rather than field-based extraction. Aider uses a hash-suffixed wrapper tag (thinking-content-7bbeb8e1...) to avoid collision with user content that might contain <thinking> literally. The tag approach is fragile — malformed responses (missing opening tag, partial closing tag) need explicit handling.

Interleaved vs Sequential Reasoning

Some models (with Anthropic’s interleaved-thinking beta) produce reasoning blocks between tool calls and text blocks. Others produce reasoning only at the start. The conversation replay logic must handle both: sequential reasoning gets prepended, interleaved reasoning gets embedded at its original position. Getting this wrong corrupts the model’s context for subsequent turns.

Summary vs Content

Codex distinguishes ReasoningContentDelta (the full reasoning text) from ReasoningSummaryDelta (a shorter summary). The TUI shows summaries in the transcript but records full content separately. OpenCode stores the full reasoning text as a ReasoningPart. Aider strips reasoning entirely after display. Each choice has tradeoffs for token budget on replayed conversations.

OpenOxide Blueprint

Unified Reasoning Configuration

Define a ReasoningConfig enum in a protocol crate:

pub enum ReasoningMode {
    /// No reasoning tokens requested.
    Off,
    /// Provider-determined budget.
    Adaptive { effort: ReasoningEffort },
    /// Explicit token budget.
    Budget { tokens: u32 },
}

pub enum ReasoningEffort {
    None,
    Minimal,
    Low,
    Medium,
    High,
    Max,
}

pub enum ReasoningSummary {
    Auto,
    Concise,
    Detailed,
    Off,
}

The ReasoningMode enum captures both Anthropic-style budgets and OpenAI-style effort levels in a single type. Provider adapters translate this to wire format.

Provider Adapter Trait

Extend the provider trait with reasoning parameter generation:

trait ProviderAdapter {
    fn reasoning_params(
        &self,
        mode: &ReasoningMode,
        summary: &ReasoningSummary,
        model: &ModelInfo,
    ) -> serde_json::Value;
}

Each provider (Anthropic, OpenAI, Google, Bedrock) implements this with its own wire format. Budget arithmetic (the output_limit - 1 cap) lives inside the adapter, not in generic code.

Streaming Event Types

pub enum ReasoningEvent {
    Start { block_id: u32 },
    Delta { block_id: u32, text: String },
    End { block_id: u32 },
    SummaryDelta { block_id: u32, summary_index: u32, text: String },
    SummaryPartAdded { block_id: u32, summary_index: u32 },
}

The block_id supports interleaved reasoning — multiple reasoning blocks in a single response, separated by tool calls or text.

Token Tracking

pub struct TokenUsage {
    pub input: u32,
    pub cached_input: u32,
    pub output: u32,
    pub reasoning_output: u32,
    pub total: u32,
}

Track reasoning_output separately. Display it in the TUI status bar alongside regular output tokens. Include it in OpenTelemetry spans.

Tag Extraction

For models that embed reasoning in tags (DeepSeek R1, Qwen QWQ), implement a TagExtractor that processes the response stream:

struct TagExtractor {
    tag: String,
    state: TagState,       // Outside | InsideTag | InsideContent
    buffer: String,
}

impl TagExtractor {
    fn process_chunk(&mut self, chunk: &str) -> Vec<ReasoningEvent>;
}

Use a state machine rather than regex for streaming — regex requires the full text, but we process chunks incrementally.

Crates

openoxide-protocol: ReasoningMode, ReasoningEffort, ReasoningSummary, ReasoningEvent, TokenUsage types.
openoxide-provider: ProviderAdapter trait implementations per provider, including reasoning parameter generation and tag extraction.
openoxide-core: Per-turn reasoning override logic, token tracking aggregation, reasoning content storage decisions.
openoxide-tui: Reasoning block rendering (collapsed/expanded), summary extraction, token display.

Key Design Decision: Store or Discard?

Follow Codex’s approach: store full reasoning content in the session log (for debugging and context replay) but display only summaries in the TUI transcript. This gives the best of both worlds — reasoning context is available for subsequent turns (via encrypted content or full text replay), while the UI stays clean.

For providers that support encrypted reasoning content (OpenAI), include it in the include parameter and store the opaque blob. For providers that return plaintext reasoning (Anthropic extended thinking, DeepSeek R1), store the full text but render it collapsed in the TUI with the first bold line as the header (matching Codex’s pattern).