Multi-Model Orchestration

Source attribution: Implementation details traced from references/aider/ at commit b9050e1d, references/codex/ at commit 4ab44e2c5, and references/opencode/ at commit 7ed4499.

Feature Definition

A single LLM isn’t enough for an efficient coding agent. The main model — typically the most capable and expensive — handles complex code generation, but using it for every task wastes money and time. Summarizing a long chat history doesn’t need Codex 5.2; gpt-5-mini does it at 1/10th the cost. Generating a commit message from a diff is a simple task. Applying edits from an architect’s plan can be done by a specialized editor model.

Multi-model orchestration assigns different model roles to different tasks within a single session:

Main model: The primary reasoning engine for code generation and conversation. Highest capability, highest cost.
Weak model: A cheaper, faster model for auxiliary tasks — commit message generation, chat history summarization, simple classification.
Editor model: A model specialized for applying file edits, used when the main model operates in “architect” mode (generating plans rather than edits directly).

The challenge is configuration. Users need sensible defaults (if you pick Claude Sonnet as your main model, Haiku should automatically be the weak model) while retaining the ability to override any role. The system must prevent infinite recursion (weak model trying to instantiate its own weak model) and handle graceful fallback when a model fails.

For per-model reasoning controls and thinking-token behavior, see Reasoning & Thinking Tokens. For prefix-level role instructions, see System Prompt. For context pressure created by role-specific outputs (summaries, edits, tool results), see Token Budgeting.

Aider Implementation

Reference: references/aider/aider/models.py, aider/history.py, aider/coders/ | Commit: b9050e1d

Aider has the most mature multi-model system of the three reference implementations. It defines three explicit model roles — main, weak, and editor — with per-model YAML configuration and CLI overrides.

ModelSettings Dataclass

The ModelSettings dataclass in aider/models.py (lines 115-139) includes fields for all three roles:

@dataclass
class ModelSettings:
    name: str
    edit_format: str = "whole"
    weak_model_name: Optional[str] = None
    editor_model_name: Optional[str] = None
    editor_edit_format: Optional[str] = None
    use_repo_map: bool = False
    streaming: bool = True
    reasoning_tag: Optional[str] = None
    accepts_settings: Optional[list] = None
    # ... other fields

The Model class inherits from ModelSettings and adds runtime instances for each role.

Initialization Flow

The Model.__init__() constructor (lines 318-357) follows a strict sequence:

class Model(ModelSettings):
    def __init__(self, model, weak_model=None, editor_model=None,
                 editor_edit_format=None, verbose=False):
        model = MODEL_ALIASES.get(model, model)
        self.name = model
        self.info = self.get_model_info(model)

        res = self.validate_environment()
        self.missing_keys = res.get("missing_keys")

        max_input_tokens = self.info.get("max_input_tokens") or 0
        self.max_chat_history_tokens = min(max(max_input_tokens / 16, 1024), 8192)

        self.configure_model_settings(model)

        if weak_model is False:
            self.weak_model_name = None
        else:
            self.get_weak_model(weak_model)

        if editor_model is False:
            self.editor_model_name = None
        else:
            self.get_editor_model(editor_model, editor_edit_format)

The False sentinel is critical: passing weak_model=False explicitly disables weak model instantiation. This prevents infinite recursion — when creating a weak model instance, the constructor passes weak_model=False to prevent the weak model from trying to create its own weak model.

Weak Model Resolution

get_weak_model() (lines 588-605) follows a priority chain:

CLI override: If --weak-model gpt-5-mini was passed, use it.
YAML settings: If weak_model_name was set in model-settings.yml, use it.
Self-reference: If no weak model is configured, self.weak_model = self — the main model doubles as its own weak model.

def get_weak_model(self, provided_weak_model_name):
    if provided_weak_model_name:
        self.weak_model_name = provided_weak_model_name

    if not self.weak_model_name:
        self.weak_model = self
        return

    if self.weak_model_name == self.name:
        self.weak_model = self
        return

    self.weak_model = Model(
        self.weak_model_name,
        weak_model=False,  # Prevent infinite recursion
    )

Default weak model pairings from model-settings.yml:

Main Model	Default Weak Model
gpt-5.2-codex	gpt-5-mini
gpt-5-mini	gpt-5-mini
claude-4-6-sonnet	claude-4-haiku-5
claude-4-6-sonnet	claude-4-haiku-5
claude-4-6-opus	claude-4-haiku-5

Editor Model Resolution

get_editor_model() (lines 610-630) works similarly but adds edit format logic:

def get_editor_model(self, provided_editor_model_name, editor_edit_format):
    if provided_editor_model_name:
        self.editor_model_name = provided_editor_model_name
    if editor_edit_format:
        self.editor_edit_format = editor_edit_format

    if not self.editor_model_name or self.editor_model_name == self.name:
        self.editor_model = self
    else:
        self.editor_model = Model(self.editor_model_name, editor_model=False)

    if not self.editor_edit_format:
        self.editor_edit_format = self.editor_model.edit_format
        if self.editor_edit_format in ("diff", "whole", "diff-fenced"):
            self.editor_edit_format = "editor-" + self.editor_edit_format

The "editor-" prefix is significant: it selects a different prompt/parser pair optimized for applying edits from an architect’s plan. The editor-diff format tells the editor model to produce SEARCH/REPLACE blocks based on the architect’s instructions, rather than generating code from scratch.

Where Weak Models Are Used

Commit message generation (aider/repo.py, lines 342-363):

def commit_message_models(self):
    return [self.weak_model, self]  # Try weak first, fall back to main

# In GitRepo:
for model in self.models:
    num_tokens = model.token_count(messages)
    max_tokens = model.info.get("max_input_tokens") or 0
    if max_tokens and num_tokens > max_tokens:
        continue  # Skip if diff too large for this model
    commit_message = model.simple_send_with_retries(messages)
    if commit_message:
        break

The weak model is tried first. If it fails (API error, context overflow), the main model is used as fallback. This “try cheap, fall back to expensive” pattern appears throughout.

Chat history summarization (aider/history.py, lines 7-123):

class ChatSummary:
    def __init__(self, models=None, max_tokens=1024):
        self.models = models if isinstance(models, list) else [models]
        self.max_tokens = max_tokens

    def summarize_all(self, messages):
        summarize_messages = [
            dict(role="system", content=prompts.summarize),
            dict(role="user", content=content),
        ]
        for model in self.models:
            try:
                summary = model.simple_send_with_retries(summarize_messages)
                if summary is not None:
                    return [dict(role="user", content=summary)]
            except Exception:
                pass
        raise ValueError("summarizer unexpectedly failed for all models")

Initialized in aider/main.py (lines 949-952) with [main_model.weak_model, main_model] — the weak model handles summarization unless it fails, in which case the main model takes over. The recursive depth limit is 3 to prevent cascading summaries.

Architect Mode

The ArchitectCoder in aider/coders/architect_coder.py (lines 22-44) implements the two-model workflow:

The main model (architect) receives the user’s request and generates a natural-language plan describing what changes to make.
The editor model receives the plan and produces actual file edits in the configured editor_edit_format.

def reply_completed(self):
    editor_model = self.main_model.editor_model or self.main_model
    kwargs = dict(main_model=editor_model, edit_format=self.main_model.editor_edit_format)
    editor_coder = Coder.create(**kwargs)
    editor_coder.run(with_message=content, preproc=False)

This separation allows using a stronger reasoning model (like o1) as the architect while a faster model (like claude-4-6-sonnet) applies the edits. The architect never sees file contents directly — it works from the repo map.

Interactive Model Switching

Users can switch models mid-session via commands in aider/commands.py:

/model <name> — switch the main model (line 93)
/weak-model <name> — switch the weak model (line 127)
/editor-model <name> — switch the editor model (line 110)

Each command creates a new Model instance preserving the other roles and raises SwitchCoder to reinitialize the coder with the new configuration.

Sanity Checking

sanity_check_models() (lines 1131-1146) validates all three models at startup:

def sanity_check_models(io, main_model):
    problem_main = sanity_check_model(io, main_model)

    problem_weak = None
    if main_model.weak_model and main_model.weak_model is not main_model:
        problem_weak = sanity_check_model(io, main_model.weak_model)

    problem_editor = None
    if (main_model.editor_model
        and main_model.editor_model is not main_model
        and main_model.editor_model is not main_model.weak_model):
        problem_editor = sanity_check_model(io, main_model.editor_model)

Identity comparison (is not) avoids redundant checks when models share instances. Each model is checked for missing API keys, unknown metadata, and provider-specific dependencies (boto3 for Bedrock, etc.).

Codex Implementation

Reference: references/codex/codex-rs/core/src/, codex-rs/protocol/src/config_types.rs | Commit: 4ab44e2c5

Codex takes a fundamentally different approach: instead of multiple model instances, it uses a single model with adjustable reasoning effort. The equivalent of Aider’s “weak model” is the same model running at reasoning_effort: "low".

Collaboration Modes

The CollaborationMode system in protocol/src/config_types.rs (lines 169-304) bundles model + behavior configuration:

pub enum ModeKind {
    Plan,        // Strategic planning
    Default,     // Standard execution
    // Legacy: PairProgramming, Execute
}

pub struct Settings {
    pub model: String,
    pub reasoning_effort: Option<ReasoningEffort>,
    pub developer_instructions: Option<String>,
}

ReasoningEffort levels: None, Minimal, Low, Medium (default), High, XHigh.

Mode Presets

Built-in presets in collaboration_mode_presets.rs:

Plan preset (line 16): Plan mode with medium reasoning effort
Default preset (line 26): Standard execution mode

Both presets inherit the configured model — model: None means “use whatever the session is configured with.” Users switch modes via the TUI, which applies a CollaborationModeMask — a partial update template that overrides only specified fields:

pub struct CollaborationModeMask {
    pub mode: Option<ModeKind>,
    pub model: Option<String>,
    pub reasoning_effort: Option<Option<ReasoningEffort>>,
    pub developer_instructions: Option<Option<String>>,
}

No Explicit Multi-Model

Codex does not have separate weak/editor models. The single model handles all tasks:

Auto-compaction: When token usage exceeds auto_compact_token_limit, Codex sends an iterative oldest-first trim request to the same model with lower reasoning effort. This is configured as a behavior flag on the model, not a separate model instance.
Commit messages: Generated by the same model.
Planning: Same model, different ModeKind (Plan vs Default) which changes the system prompt but not the model.

The SessionConfiguration in codex.rs (lines 699-746) holds the active collaboration_mode and model_reasoning_summary config. A previous_model field (in state/session.rs, line 29) tracks model switches for task handling.

Model Selection

ModelsManager.get_default_model() (lines 112-135 of manager.rs) selects the default model based on auth mode. get_model_info() (lines 139-153) resolves metadata with config overrides using longest-prefix matching against remote model data.

OpenCode Implementation

Reference: references/opencode/packages/opencode/src/agent/, src/session/llm.ts, src/provider/transform.ts | Commit: 7ed4499

OpenCode implements multi-model via an agent system where each agent can optionally specify its own model override.

Agent System

agent/agent.ts (lines 24-49) defines the Agent.Info schema:

export const Info = z.object({
    name: z.string(),
    mode: z.enum(["subagent", "primary", "all"]),
    model: z.object({
        modelID: z.string(),
        providerID: z.string(),
    }).optional(),
    temperature: z.number().optional(),
    topP: z.number().optional(),
    options: z.record(z.string(), z.any()),
    prompt: z.string().optional(),
    steps: z.number().int().positive().optional(),
})

The key field is model — optional, meaning each agent can use a different model than the session default.

Built-in Agents

Seven built-in agents (lines 76-201):

Agent	Mode	Purpose	Model Override
`build`	primary	Default coding agent, executes tools	None (session default)
`plan`	primary	Plan mode, disallows edit tools	None
`general`	subagent	Multi-step parallel work	None
`explore`	subagent	Read-only codebase search	None
`compaction`	primary (hidden)	Context summarization	None
`title`	primary (hidden)	Session title generation	None, `temperature: 0.5`
`summary`	primary (hidden)	Context summarization	None

By default, all agents inherit the session model. But users can override per agent in config:

{
  "agent": {
    "compaction": {
      "model": { "providerID": "openai", "modelID": "gpt-5-mini" }
    }
  }
}

The `small` Flag

OpenCode’s equivalent of a “weak model” is a boolean small flag on StreamInput in session/llm.ts (line 38):

export type StreamInput = {
    small?: boolean;
    // ... other fields
}

When small: true, the system uses the same model but with reduced reasoning effort via smallOptions() in provider/transform.ts (lines 778-809):

export function smallOptions(model: Provider.Model) {
    if (model.providerID === "openai" || model.api.npm === "@ai-sdk/openai") {
        if (model.api.id.includes("-codex")) {
            if (model.api.id.includes("5.")) {
                return { store: false, reasoningEffort: "low" }
            }
            return { store: false, reasoningEffort: "minimal" }
        }
        return { store: false }
    }
    if (model.providerID === "google") {
        if (model.api.id.includes("gemini-3")) {
            return { thinkingConfig: { thinkingLevel: "minimal" } }
        }
        return { thinkingConfig: { thinkingBudget: 0 } }
    }
    // ... per-provider logic for Anthropic, etc.
}

This approach is provider-aware: OpenAI models get reasoningEffort: "low", Google models get thinkingLevel: "minimal", each provider’s cost-reduction lever mapped to its native API parameter.

Model Selection in Streaming

session/llm.ts (lines 100-102) selects between full and small options:

const base = input.small
    ? ProviderTransform.smallOptions(input.model)
    : ProviderTransform.options({ model: input.model, ... })

The small flag is set by internal callers (compaction, title generation) — never by users directly.

Pitfalls & Hard Lessons

Infinite recursion: Aider’s Model constructor creates sub-model instances for weak and editor roles. Without the weak_model=False sentinel, the weak model would try to create its own weak model, creating infinite recursion. This is a solved problem in Aider but a trap for reimplementers.

Fallback chains mask failures: When the weak model fails silently and falls back to the main model, users don’t notice the extra cost. Aider’s “try weak, fall back to main” pattern works well for reliability but can be expensive if the weak model consistently fails (wrong API key, rate limited).

Context window mismatch: The weak model often has a smaller context window than the main model. Chat summaries generated by the main model might not fit in the weak model’s context for the next summarization pass. Aider handles this via token counting before dispatch, but the check is per-call, not global.

Editor model format coupling: The editor model must produce output in a specific edit format (editor-diff, editor-whole). If you pair a model that hasn’t been tested with that format, edit parsing failures increase. Aider’s model-settings.yml encodes tested pairings.

Reasoning effort vs. separate model: Codex and OpenCode use reasoning effort levels instead of separate models. This is cheaper (no second API key needed) but less flexible — you can’t use Claude for summarization while using GPT for coding.

OpenOxide Blueprint

Model Role System

pub struct ModelOrchestrator {
    pub main: Arc<dyn LlmProvider>,
    pub weak: Option<Arc<dyn LlmProvider>>,
    pub editor: Option<Arc<dyn LlmProvider>>,
}

impl ModelOrchestrator {
    /// Returns models in priority order for auxiliary tasks.
    /// Tries weak first, falls back to main.
    pub fn auxiliary_models(&self) -> Vec<Arc<dyn LlmProvider>> {
        let mut models = Vec::new();
        if let Some(weak) = &self.weak {
            models.push(Arc::clone(weak));
        }
        models.push(Arc::clone(&self.main));
        models
    }
}

Configuration

[model]
main = "claude-4-6-sonnet"
weak = "claude-4-haiku-5"        # Optional, defaults based on main
editor = "claude-4-6-sonnet"      # Optional, defaults to main
editor_format = "editor-diff"     # Optional, auto-derived

Default pairings shipped as a bundled TOML table — when the user picks a main model, look up the default weak/editor model. Allow CLI overrides (--weak-model, --editor-model).

Reasoning Effort Integration

For providers that support it, implement a SmallOptions trait:

pub trait SmallOptions {
    fn small_options(&self, model: &ModelInfo) -> HashMap<String, Value>;
}

Provider implementations return { "reasoning_effort": "low" } for OpenAI, { "thinking": { "budget_tokens": 1024 } } for Anthropic, etc. This supplements the multi-model system — even when using the same model for auxiliary tasks, reasoning effort can be reduced.

Agent-Based Routing

Adopt OpenCode’s pattern of per-agent model selection for extensibility:

pub struct AgentConfig {
    pub name: String,
    pub model_override: Option<ModelId>,
    pub temperature: Option<f32>,
    pub max_steps: Option<u32>,
}

Built-in agents (build, plan, explore, compaction) inherit from the orchestrator unless overridden. User-defined agents in config can specify any model/provider combination.

Fallback Strategy

pub async fn with_fallback<T, F>(
    models: &[Arc<dyn LlmProvider>],
    task: F,
) -> Result<T>
where
    F: Fn(Arc<dyn LlmProvider>) -> BoxFuture<Result<T>>,
{
    for model in models {
        match task(Arc::clone(model)).await {
            Ok(result) => return Ok(result),
            Err(e) if e.is_retriable() => continue,
            Err(e) => return Err(e),
        }
    }
    Err(Error::AllModelsFailed)
}

The is_retriable() check distinguishes between “this model can’t handle it” (context overflow, rate limit) and “this is a permanent error” (invalid API key, unsupported feature).