Skip to content

Provider Abstraction

Source attribution: Implementation details traced from references/aider/ at commit b9050e1d, references/codex/ at commit 4ab44e2c5, and references/opencode/ at commit 7ed4499.

Every AI coding agent must talk to multiple LLM providers — OpenAI, Anthropic, Google, local models via Ollama, proxy services like OpenRouter. Each provider has different API shapes, authentication schemes, endpoint URLs, and model metadata. The provider abstraction layer exists to hide these differences behind a uniform interface so the rest of the system (chat loop, edit application, tool execution) never needs to know which provider it’s talking to.

This is harder than it sounds. Providers differ not just in their REST endpoints, but in subtle ways: Anthropic uses max_tokens as a required field while OpenAI treats it as optional. Google requires different authentication flows. Some providers support streaming, others don’t. Function calling schemas vary. Rate limit headers come in different formats. The abstraction must handle all of this without leaking provider-specific logic into the coder or session layers.

The secondary challenge is model metadata. Every model has a context window size, input/output pricing, supported features (vision, function calling, reasoning tokens), and a preferred edit format. This metadata must be discoverable at runtime — ideally cached locally with a TTL to avoid hitting remote APIs on every startup.

This page focuses on provider wiring and capability metadata. For how those capabilities are consumed in per-turn prompts, see System Prompt, Streaming, and Multi-Model Orchestration. For budget-level consequences of provider context limits, see Token Budgeting.


Reference: references/aider/aider/models.py, aider/llm.py | Commit: b9050e1d

Aider delegates all provider routing to litellm, a Python library that wraps 100+ LLM providers behind a single completion() call. The model name string encodes the provider: gpt-5.2-codex routes to OpenAI, claude-4-6-opus to Anthropic, deepseek/deepseek-chat to DeepSeek, openrouter/anthropic/claude-4-haiku-5 to OpenRouter. No explicit provider selection is needed — litellm parses the model name prefix and dispatches accordingly.

litellm is expensive to import (~1.5 seconds). Aider defers it via a LazyLiteLLM wrapper in aider/llm.py (lines 21-45):

class LazyLiteLLM:
_lazy_module = None
def __getattr__(self, name):
if name == "_lazy_module":
return super()
self._load_litellm()
return getattr(self._lazy_module, name)
def _load_litellm(self):
if self._lazy_module is not None:
return
self._lazy_module = importlib.import_module("litellm")
self._lazy_module.suppress_debug_info = True
self._lazy_module.set_verbose = False
self._lazy_module.drop_params = True # silently drop unsupported params per provider
litellm = LazyLiteLLM()

The drop_params=True setting is critical — it tells litellm to silently ignore parameters that a specific provider doesn’t support, rather than raising errors. This allows Aider to pass a superset of parameters (temperature, tools, cache_control headers) and let litellm filter per provider.

aider/models.py (lines 86-111) defines friendly aliases:

MODEL_ALIASES = {
"sonnet": "claude-4-6-sonnet",
"haiku": "claude-4-haiku-5",
"opus": "claude-4-6-opus",
"codex": "gpt-5.2-codex",
"deepseek": "deepseek/deepseek-chat",
"flash": "gemini/gemini-3-flash-preview",
"r1": "deepseek/deepseek-reasoner",
"grok3": "xai/grok-3-beta",
}

Users type --model sonnet and the alias resolves to claude-4-6-sonnet before any provider logic runs.

The ModelInfoManager class (lines 149-262) fetches and caches model metadata from two sources:

  1. BerriAI litellm database: A JSON file from raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json, cached at ~/.aider/caches/ with a 24-hour TTL. Contains context windows, pricing, and provider mappings for thousands of models.

  2. OpenRouter API: For OpenRouter-specific models, falls back to web scraping from openrouter.ai/<model_route> to extract context window and pricing via regex.

The metadata dict per model includes max_input_tokens, max_output_tokens, input_cost_per_token, output_cost_per_token, and litellm_provider.

Aider uses a two-tier validation strategy to check API keys:

Fast path (fast_validate_environment(), lines 697-726): Checks hardcoded provider-to-env-var mappings without importing litellm. Handles common providers (OpenAI, Anthropic, DeepSeek, Gemini, Groq, OpenRouter, Fireworks) via a simple keymap dict and model name pattern matching against OPENAI_MODELS and ANTHROPIC_MODELS lists.

Fallback path (validate_environment(), lines 728-765): Imports litellm and calls litellm.validate_environment(model). Handles edge cases like AWS Bedrock with AWS_PROFILE and non-standard providers.

The actual API call happens in Model.send_completion() (lines 970-1022):

def send_completion(self, messages, functions, stream, temperature=None):
kwargs = dict(model=self.name, stream=stream)
if self.use_temperature is not False:
kwargs["temperature"] = temperature or float(self.use_temperature)
if functions is not None:
kwargs["tools"] = [dict(type="function", function=functions[0])]
kwargs["tool_choice"] = {"type": "function", "function": {"name": functions[0]["name"]}}
if self.extra_params:
kwargs.update(self.extra_params)
# Ollama-specific: calculate context size
if self.is_ollama() and "num_ctx" not in kwargs:
kwargs["num_ctx"] = int(self.token_count(messages) * 1.25) + 8192
kwargs["timeout"] = kwargs.get("timeout", 600)
kwargs["messages"] = messages
res = litellm.completion(**kwargs)
return hash_object, res

The key insight: litellm.completion(**kwargs) is the single dispatch point. The model string (gpt-5.2-codex, claude-4-6-opus, deepseek/deepseek-chat) determines which provider API gets called. Provider-specific parameters like Anthropic’s cache_control headers or Ollama’s num_ctx are merged via extra_params from the model settings YAML.

aider/resources/model-settings.yml provides per-model configuration:

- name: claude-4-6-sonnet-20260115
edit_format: diff
weak_model_name: claude-4-haiku-5-20260115
use_repo_map: true
cache_control: true
extra_params:
extra_headers:
anthropic-beta: prompt-caching-2024-07-31,pdfs-2024-09-25
max_tokens: 8192
editor_model_name: claude-4-6-sonnet-20260115
editor_edit_format: editor-diff

The same model accessible via different providers gets separate entries:

- name: anthropic/claude-4-6-sonnet-20260115 # Direct Anthropic
- name: bedrock/anthropic.claude-4-6-sonnet-v1:0 # AWS Bedrock
- name: vertex_ai/claude-4-6-sonnet@20260115 # Google Vertex AI
- name: openrouter/anthropic/claude-4-6-sonnet # OpenRouter proxy

For unknown models, apply_generic_model_settings() (lines 421-583) applies heuristics based on name patterns — reasoning models get use_temperature=False and streaming=False, Claude models get cache_control=True, etc.

When users type a partial model name, fuzzy_match_models() (lines 1212-1254) first checks exact substring containment, then falls back to Levenshtein distance matching at 80% similarity threshold using difflib.get_close_matches().

ProviderModel PrefixRequired Env VarExample
OpenAI(none)OPENAI_API_KEYgpt-5.2-codex
Anthropic(none)ANTHROPIC_API_KEYclaude-4-6-opus
AWS Bedrockbedrock/AWS_PROFILEbedrock/anthropic.claude-4-6-sonnet-v1:0
Google Vertexvertex_ai/Google credentialsvertex_ai/claude-4-6-sonnet@20260115
DeepSeekdeepseek/DEEPSEEK_API_KEYdeepseek/deepseek-chat
Geminigemini/GEMINI_API_KEYgemini/gemini-3-flash-preview
Groqgroq/GROQ_API_KEYgroq/mixtral-8x7b-32768
OpenRouteropenrouter/OPENROUTER_API_KEYopenrouter/anthropic/claude-4-haiku-5
Ollamaollama/(none, local)ollama/mistral
GitHub Copilot(none)GITHUB_COPILOT_TOKENVia token exchange

Reference: references/codex/codex-rs/codex-api/, codex-rs/core/src/ | Commit: 4ab44e2c5

Codex implements provider abstraction in Rust with a clean layered architecture. Unlike Aider’s litellm delegation, Codex builds its own provider system from scratch — though it currently only supports OpenAI-compatible APIs (the Responses API, not Chat Completions).

The Provider struct in codex-api/src/provider.rs (lines 43-50) encapsulates HTTP endpoint configuration:

pub struct Provider {
pub name: String,
pub base_url: String,
pub query_params: Option<HashMap<String, String>>,
pub headers: HeaderMap,
pub retry: RetryConfig,
pub stream_idle_timeout: Duration,
}

Key methods:

  • url_for_path(&self, path: &str) — constructs full URLs (line 53)
  • build_request(&self, method, path) — creates HTTP requests with pre-configured headers (line 77)
  • is_azure_responses_endpoint(&self) — detects Azure deployments for special handling (line 88)
  • websocket_url_for_path(&self, path) — converts HTTP to WS/WSS schemes (line 92)

Codex exclusively uses the OpenAI Responses API. The WireApi enum in core/src/model_provider_info.rs (lines 34-55) enforces this:

pub enum WireApi {
#[default]
Responses,
}

Attempting to use wire_api = "chat" in config produces an error message directing users to switch to wire_api = "responses". This is a deliberate design choice — the Responses API is the modern OpenAI protocol and Codex doesn’t maintain backwards compatibility with Chat Completions.

The user-facing provider definition in core/src/model_provider_info.rs (lines 60-114):

pub struct ModelProviderInfo {
pub name: String,
pub base_url: Option<String>,
pub env_key: Option<String>,
pub env_key_instructions: Option<String>,
pub experimental_bearer_token: Option<String>,
pub wire_api: WireApi,
pub query_params: Option<HashMap<String, String>>,
pub http_headers: Option<HashMap<String, String>>,
pub env_http_headers: Option<HashMap<String, String>>,
pub request_max_retries: Option<u64>,
pub stream_max_retries: Option<u64>,
pub stream_idle_timeout_ms: Option<u64>,
pub requires_openai_auth: bool,
pub supports_websockets: bool,
}

The env_http_headers field deserializes header values from environment variables at runtime — useful for dynamic tokens or org IDs.

built_in_model_providers() (lines 271-292) ships three providers:

pub fn built_in_model_providers() -> HashMap<String, ModelProviderInfo> {
[
("openai", create_openai_provider()),
("ollama", create_oss_provider(DEFAULT_OLLAMA_PORT, WireApi::Responses)),
("lmstudio", create_oss_provider(DEFAULT_LMSTUDIO_PORT, WireApi::Responses)),
]
}

The OpenAI provider (lines 218-257) defaults to https://api.openai.com/v1, supports WebSockets, and reads org/project headers from environment variables. The OPENAI_BASE_URL env var overrides the base URL for custom deployments.

Custom providers are defined in ~/.codex/config.toml:

[provider.azure]
name = "Azure"
base_url = "https://xxxxx.openai.azure.com/openai"
env_key = "AZURE_OPENAI_API_KEY"
query_params = { api-version = "2025-04-01-preview" }

The AuthProvider trait in codex-api/src/auth.rs (lines 10-15) is minimal:

pub trait AuthProvider: Send + Sync {
fn bearer_token(&self) -> Option<String>;
fn account_id(&self) -> Option<String> { None }
}

add_auth_headers_to_header_map() (lines 17-28) injects Authorization: Bearer <token> and optional ChatGPT-Account-ID headers into every request. The trait is intentionally cheap and non-blocking — I/O happens at higher layers.

The CoreAuthProvider implementation in core/src/api_bridge.rs (lines 284-298) resolves auth with a three-level priority chain:

  1. Provider env_key — API key from environment variable (e.g., OPENAI_API_KEY)
  2. Experimental bearer token — hardcoded token in provider config
  3. User auth — ChatGPT OAuth or API key from the AuthManager

The AuthManager (lines 946-953) handles token lifecycle including refresh (via https://auth.openai.com/oauth/token), caching with an 8-day staleness threshold, and 401 recovery state machines.

The ModelsClient in codex-api/src/endpoint/models.rs fetches model info from the /models endpoint with ETag-based caching. The ModelsManager in core/src/models_manager/manager.rs coordinates remote discovery with bundled metadata:

pub struct ModelsManager {
local_models: Vec<ModelPreset>,
remote_models: RwLock<Vec<ModelInfo>>,
auth_manager: Arc<AuthManager>,
etag: RwLock<Option<String>>,
cache_manager: ModelsCacheManager,
provider: ModelProviderInfo,
}
  • Cache TTL: 5 minutes
  • Network fetch timeout: 5 seconds
  • Cache location: ~/.codex/models_cache.json
  • Refresh strategies: Online, Offline, OnlineIfUncached

Model resolution uses longest-prefix matchinggpt-4.5 matches remote gpt-4 if no exact version exists. Unknown models get conservative fallback metadata: 272k context, 10k byte truncation, no reasoning support, with used_fallback_model_metadata: true flagged.

The ModelInfo struct in protocol/src/openai_models.rs (lines 217-261) carries rich metadata: context window, truncation policy, supported reasoning levels, shell tool type, visibility, parallel tool call support, input modalities (text/image), and reasoning summary support.

pub struct RetryConfig {
pub max_attempts: u64,
pub base_delay: Duration,
pub retry_429: bool,
pub retry_5xx: bool,
pub retry_transport: bool,
}

Retry applies to both unary and streaming calls. 429 (rate limit) and 5xx (server error) responses are retried with exponential backoff. Transport errors (timeout, network) are retried separately.

The ResponsesClient in codex-api/src/endpoint/responses.rs handles streaming:

pub struct ResponsesClient<T: HttpTransport, A: AuthProvider> {
session: EndpointSession<T, A>,
sse_telemetry: Option<Arc<dyn SseTelemetry>>,
}

Streaming uses HTTP POST to /responses with Accept: text/event-stream, optional Zstd compression, conversation ID headers for multi-turn context, and idle timeout detection.


Reference: references/opencode/packages/opencode/src/provider/ | Commit: 7ed4499

OpenCode builds its provider abstraction on the Vercel AI SDK, using their @ai-sdk/* family of provider packages. Each provider is an npm package that implements the LanguageModelV2 interface.

provider/provider.ts (lines 84-107) defines the BUNDLED_PROVIDERS map:

const BUNDLED_PROVIDERS = {
"@ai-sdk/openai": (opts) => createOpenAI(opts),
"@ai-sdk/anthropic": (opts) => createAnthropic(opts),
"@ai-sdk/google": (opts) => createGoogleGenerativeAI(opts),
"@ai-sdk/google-vertex": (opts) => createVertex(opts),
"@ai-sdk/azure": (opts) => createAzure(opts),
"@ai-sdk/amazon-bedrock": (opts) => createAmazonBedrock(opts),
"@ai-sdk/groq": (opts) => createGroq(opts),
"@ai-sdk/mistral": (opts) => createMistral(opts),
"@ai-sdk/xai": (opts) => createXai(opts),
"@ai-sdk/cerebras": (opts) => createCerebras(opts),
"@ai-sdk/cohere": (opts) => createCohere(opts),
"@ai-sdk/deepinfra": (opts) => createDeepinfra(opts),
"@ai-sdk/perplexity": (opts) => createPerplexity(opts),
"@ai-sdk/togetherai": (opts) => createTogetherai(opts),
"@ai-sdk/openai-compatible": (opts) => createOpenAICompatible(opts),
}

This gives OpenCode native support for 15+ providers out of the box, each with provider-specific optimizations.

CUSTOM_LOADERS (lines 116-250) apply provider-specific initialization logic:

Anthropic (lines 117-126): Injects beta headers for extended features (prompt-caching-2024-07-31, pdfs-2024-09-25, interleaved-thinking-2025-05-14).

OpenAI (lines 150-157): Uses .responses() API instead of .chat() for newer models, calling getModel() to select the right protocol.

GitHub Copilot (lines 159-177): Routes GPT-5+ models to the Responses API via shouldUseCopilotResponsesApi() (regex check on model ID).

Azure (lines 179-191): Conditional routing between .responses() and .chat() based on model detection.

OpenCode fetches model metadata from https://models.dev, an external model catalog:

provider/models.ts
const filepath = path.join(Global.Path.cache, "models.json")

Models are cached locally as JSON. The ModelsDev namespace provides model discovery and configuration lookup, referenced from Config.state() in config/config.ts (line 7).

provider/transform.ts handles provider-specific parameter adaptation:

Prompt caching varies by provider (lines around 778-809):

  • Anthropic: cacheControl: { type: "ephemeral" }
  • OpenAI: cachePoint: { type: "default" }
  • Google/Copilot: copilot_cache_control: { type: "ephemeral" } or cache_control: { type: "ephemeral" }

Small model options (smallOptions(), lines 778-809): Provider-aware cost reduction:

export function smallOptions(model: Provider.Model) {
if (model.providerID === "openai") {
if (model.api.id.includes("-codex")) {
return { store: false, reasoningEffort: "low" }
}
return { store: false }
}
if (model.providerID === "google") {
return { thinkingConfig: { thinkingLevel: "minimal" } }
}
// ... per-provider logic
}

OpenCode loads provider/model config from multiple sources with cascading precedence (from config/config.ts, lines 68-193):

  1. Remote .well-known/opencode (org defaults, lowest priority)
  2. Global config (~/.config/opencode/opencode.json{,c})
  3. Custom config path (OPENCODE_CONFIG env var)
  4. Project config (opencode.json{,c} via findUp)
  5. .opencode/ directory configs
  6. Inline config (OPENCODE_CONFIG_CONTENT env var)
  7. Managed config directory (enterprise, highest priority: /etc/opencode on Linux, /Library/Application Support/opencode on macOS)

Each layer can define providers with custom API URLs, keys, and model overrides.

API keys come from environment variables, resolved per provider. The Auth module provides Auth.all() for collecting all configured credentials, including .well-known/opencode token exchange for enterprise deployments.


litellm version drift: Aider’s reliance on litellm means new provider support depends on litellm releases. When litellm introduces breaking changes (which happens regularly), Aider must pin versions and sometimes patch around regressions.

Provider-specific parameter leakage: The drop_params=True approach in Aider silently swallows parameters. This means you can accidentally send cache_control to a provider that doesn’t support it and never know it was dropped. Codex avoids this by only supporting one API shape (Responses API).

Token counting divergence: Each provider uses different tokenizers (cl100k_base for GPT, Claude’s proprietary tokenizer, etc.). Aider delegates to litellm.token_counter() which handles this, but the counts are approximate. Codex uses a simple 4-byte heuristic for pre-flight checks.

ETag caching failures: Codex’s model metadata cache depends on the /models endpoint returning proper ETags. If the server doesn’t support ETags, the cache becomes stale. The 5-minute TTL provides a safety net.

Rate limit header inconsistency: Different providers return rate limit information in different headers. OpenAI uses x-ratelimit-*, Anthropic uses their own format. Error classification must be provider-aware even when the abstraction hides provider identity.

Azure Responses API adoption: Azure’s OpenAI deployments lag behind the main OpenAI API. Both Codex and OpenCode must detect Azure endpoints and conditionally fall back from Responses API to Chat Completions.


Provider trait:

#[async_trait]
pub trait LlmProvider: Send + Sync {
async fn complete(&self, request: CompletionRequest) -> Result<CompletionStream>;
fn model_info(&self) -> &ModelInfo;
fn name(&self) -> &str;
}

Built-in providers: OpenAI (Responses API), Anthropic (Messages API), Ollama (local). Each implements LlmProvider directly — no litellm equivalent exists in the Rust ecosystem.

Provider registry: A HashMap<String, Box<dyn LlmProvider>> keyed by provider name, populated from config and CLI flags. Model name prefixes (anthropic/, ollama/) route to the right provider.

Model metadata: Ship a bundled models.json compiled into the binary via include_str!(). Supplement with optional remote fetch (OpenAI /models endpoint) cached at ~/.openoxide/cache/models.json with 5-minute TTL and ETag support.

Auth: Use the keyring crate for OS credential storage (macOS Keychain, Linux Secret Service, Windows Credential Manager). Fall back to environment variables. Support OPENAI_API_KEY, ANTHROPIC_API_KEY, and custom env_key per provider in config.

Config: Provider definitions in ~/.openoxide/config.toml:

[provider.azure]
name = "Azure OpenAI"
base_url = "https://my-instance.openai.azure.com"
api_key_env = "AZURE_OPENAI_API_KEY"
wire_api = "responses"

Retry: Per-provider RetryConfig with exponential backoff for 429 and 5xx. Use tokio-retry or a custom retry loop.

Streaming: All providers return Pin<Box<dyn Stream<Item = Result<StreamEvent>>>>. The stream abstraction normalizes provider-specific SSE formats into a common StreamEvent enum (text delta, tool call, reasoning, done).

Token counting: Use tiktoken-rs for OpenAI models. For Anthropic and others, use character-count heuristics (chars/4) for pre-flight budget checks, with actual token counts from API response usage fields for post-hoc tracking.