Provider Abstraction
Source attribution: Implementation details traced from
references/aider/at commitb9050e1d,references/codex/at commit4ab44e2c5, andreferences/opencode/at commit7ed4499.
Feature Definition
Section titled “Feature Definition”Every AI coding agent must talk to multiple LLM providers — OpenAI, Anthropic, Google, local models via Ollama, proxy services like OpenRouter. Each provider has different API shapes, authentication schemes, endpoint URLs, and model metadata. The provider abstraction layer exists to hide these differences behind a uniform interface so the rest of the system (chat loop, edit application, tool execution) never needs to know which provider it’s talking to.
This is harder than it sounds. Providers differ not just in their REST endpoints, but in subtle ways: Anthropic uses max_tokens as a required field while OpenAI treats it as optional. Google requires different authentication flows. Some providers support streaming, others don’t. Function calling schemas vary. Rate limit headers come in different formats. The abstraction must handle all of this without leaking provider-specific logic into the coder or session layers.
The secondary challenge is model metadata. Every model has a context window size, input/output pricing, supported features (vision, function calling, reasoning tokens), and a preferred edit format. This metadata must be discoverable at runtime — ideally cached locally with a TTL to avoid hitting remote APIs on every startup.
This page focuses on provider wiring and capability metadata. For how those capabilities are consumed in per-turn prompts, see System Prompt, Streaming, and Multi-Model Orchestration. For budget-level consequences of provider context limits, see Token Budgeting.
Aider Implementation
Section titled “Aider Implementation”Reference: references/aider/aider/models.py, aider/llm.py | Commit: b9050e1d
Aider delegates all provider routing to litellm, a Python library that wraps 100+ LLM providers behind a single completion() call. The model name string encodes the provider: gpt-5.2-codex routes to OpenAI, claude-4-6-opus to Anthropic, deepseek/deepseek-chat to DeepSeek, openrouter/anthropic/claude-4-haiku-5 to OpenRouter. No explicit provider selection is needed — litellm parses the model name prefix and dispatches accordingly.
Lazy Loading
Section titled “Lazy Loading”litellm is expensive to import (~1.5 seconds). Aider defers it via a LazyLiteLLM wrapper in aider/llm.py (lines 21-45):
class LazyLiteLLM: _lazy_module = None
def __getattr__(self, name): if name == "_lazy_module": return super() self._load_litellm() return getattr(self._lazy_module, name)
def _load_litellm(self): if self._lazy_module is not None: return self._lazy_module = importlib.import_module("litellm") self._lazy_module.suppress_debug_info = True self._lazy_module.set_verbose = False self._lazy_module.drop_params = True # silently drop unsupported params per provider
litellm = LazyLiteLLM()The drop_params=True setting is critical — it tells litellm to silently ignore parameters that a specific provider doesn’t support, rather than raising errors. This allows Aider to pass a superset of parameters (temperature, tools, cache_control headers) and let litellm filter per provider.
Model Aliases
Section titled “Model Aliases”aider/models.py (lines 86-111) defines friendly aliases:
MODEL_ALIASES = { "sonnet": "claude-4-6-sonnet", "haiku": "claude-4-haiku-5", "opus": "claude-4-6-opus", "codex": "gpt-5.2-codex", "deepseek": "deepseek/deepseek-chat", "flash": "gemini/gemini-3-flash-preview", "r1": "deepseek/deepseek-reasoner", "grok3": "xai/grok-3-beta",}Users type --model sonnet and the alias resolves to claude-4-6-sonnet before any provider logic runs.
Model Metadata Caching
Section titled “Model Metadata Caching”The ModelInfoManager class (lines 149-262) fetches and caches model metadata from two sources:
-
BerriAI litellm database: A JSON file from
raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json, cached at~/.aider/caches/with a 24-hour TTL. Contains context windows, pricing, and provider mappings for thousands of models. -
OpenRouter API: For OpenRouter-specific models, falls back to web scraping from
openrouter.ai/<model_route>to extract context window and pricing via regex.
The metadata dict per model includes max_input_tokens, max_output_tokens, input_cost_per_token, output_cost_per_token, and litellm_provider.
Environment Validation
Section titled “Environment Validation”Aider uses a two-tier validation strategy to check API keys:
Fast path (fast_validate_environment(), lines 697-726): Checks hardcoded provider-to-env-var mappings without importing litellm. Handles common providers (OpenAI, Anthropic, DeepSeek, Gemini, Groq, OpenRouter, Fireworks) via a simple keymap dict and model name pattern matching against OPENAI_MODELS and ANTHROPIC_MODELS lists.
Fallback path (validate_environment(), lines 728-765): Imports litellm and calls litellm.validate_environment(model). Handles edge cases like AWS Bedrock with AWS_PROFILE and non-standard providers.
API Dispatch
Section titled “API Dispatch”The actual API call happens in Model.send_completion() (lines 970-1022):
def send_completion(self, messages, functions, stream, temperature=None): kwargs = dict(model=self.name, stream=stream)
if self.use_temperature is not False: kwargs["temperature"] = temperature or float(self.use_temperature)
if functions is not None: kwargs["tools"] = [dict(type="function", function=functions[0])] kwargs["tool_choice"] = {"type": "function", "function": {"name": functions[0]["name"]}}
if self.extra_params: kwargs.update(self.extra_params)
# Ollama-specific: calculate context size if self.is_ollama() and "num_ctx" not in kwargs: kwargs["num_ctx"] = int(self.token_count(messages) * 1.25) + 8192
kwargs["timeout"] = kwargs.get("timeout", 600) kwargs["messages"] = messages
res = litellm.completion(**kwargs) return hash_object, resThe key insight: litellm.completion(**kwargs) is the single dispatch point. The model string (gpt-5.2-codex, claude-4-6-opus, deepseek/deepseek-chat) determines which provider API gets called. Provider-specific parameters like Anthropic’s cache_control headers or Ollama’s num_ctx are merged via extra_params from the model settings YAML.
Model Settings YAML
Section titled “Model Settings YAML”aider/resources/model-settings.yml provides per-model configuration:
- name: claude-4-6-sonnet-20260115 edit_format: diff weak_model_name: claude-4-haiku-5-20260115 use_repo_map: true cache_control: true extra_params: extra_headers: anthropic-beta: prompt-caching-2024-07-31,pdfs-2024-09-25 max_tokens: 8192 editor_model_name: claude-4-6-sonnet-20260115 editor_edit_format: editor-diffThe same model accessible via different providers gets separate entries:
- name: anthropic/claude-4-6-sonnet-20260115 # Direct Anthropic- name: bedrock/anthropic.claude-4-6-sonnet-v1:0 # AWS Bedrock- name: vertex_ai/claude-4-6-sonnet@20260115 # Google Vertex AI- name: openrouter/anthropic/claude-4-6-sonnet # OpenRouter proxyFor unknown models, apply_generic_model_settings() (lines 421-583) applies heuristics based on name patterns — reasoning models get use_temperature=False and streaming=False, Claude models get cache_control=True, etc.
Fuzzy Model Matching
Section titled “Fuzzy Model Matching”When users type a partial model name, fuzzy_match_models() (lines 1212-1254) first checks exact substring containment, then falls back to Levenshtein distance matching at 80% similarity threshold using difflib.get_close_matches().
Supported Providers
Section titled “Supported Providers”| Provider | Model Prefix | Required Env Var | Example |
|---|---|---|---|
| OpenAI | (none) | OPENAI_API_KEY | gpt-5.2-codex |
| Anthropic | (none) | ANTHROPIC_API_KEY | claude-4-6-opus |
| AWS Bedrock | bedrock/ | AWS_PROFILE | bedrock/anthropic.claude-4-6-sonnet-v1:0 |
| Google Vertex | vertex_ai/ | Google credentials | vertex_ai/claude-4-6-sonnet@20260115 |
| DeepSeek | deepseek/ | DEEPSEEK_API_KEY | deepseek/deepseek-chat |
| Gemini | gemini/ | GEMINI_API_KEY | gemini/gemini-3-flash-preview |
| Groq | groq/ | GROQ_API_KEY | groq/mixtral-8x7b-32768 |
| OpenRouter | openrouter/ | OPENROUTER_API_KEY | openrouter/anthropic/claude-4-haiku-5 |
| Ollama | ollama/ | (none, local) | ollama/mistral |
| GitHub Copilot | (none) | GITHUB_COPILOT_TOKEN | Via token exchange |
Codex Implementation
Section titled “Codex Implementation”Reference: references/codex/codex-rs/codex-api/, codex-rs/core/src/ | Commit: 4ab44e2c5
Codex implements provider abstraction in Rust with a clean layered architecture. Unlike Aider’s litellm delegation, Codex builds its own provider system from scratch — though it currently only supports OpenAI-compatible APIs (the Responses API, not Chat Completions).
Provider Struct
Section titled “Provider Struct”The Provider struct in codex-api/src/provider.rs (lines 43-50) encapsulates HTTP endpoint configuration:
pub struct Provider { pub name: String, pub base_url: String, pub query_params: Option<HashMap<String, String>>, pub headers: HeaderMap, pub retry: RetryConfig, pub stream_idle_timeout: Duration,}Key methods:
url_for_path(&self, path: &str)— constructs full URLs (line 53)build_request(&self, method, path)— creates HTTP requests with pre-configured headers (line 77)is_azure_responses_endpoint(&self)— detects Azure deployments for special handling (line 88)websocket_url_for_path(&self, path)— converts HTTP to WS/WSS schemes (line 92)
Wire API
Section titled “Wire API”Codex exclusively uses the OpenAI Responses API. The WireApi enum in core/src/model_provider_info.rs (lines 34-55) enforces this:
pub enum WireApi { #[default] Responses,}Attempting to use wire_api = "chat" in config produces an error message directing users to switch to wire_api = "responses". This is a deliberate design choice — the Responses API is the modern OpenAI protocol and Codex doesn’t maintain backwards compatibility with Chat Completions.
ModelProviderInfo
Section titled “ModelProviderInfo”The user-facing provider definition in core/src/model_provider_info.rs (lines 60-114):
pub struct ModelProviderInfo { pub name: String, pub base_url: Option<String>, pub env_key: Option<String>, pub env_key_instructions: Option<String>, pub experimental_bearer_token: Option<String>, pub wire_api: WireApi, pub query_params: Option<HashMap<String, String>>, pub http_headers: Option<HashMap<String, String>>, pub env_http_headers: Option<HashMap<String, String>>, pub request_max_retries: Option<u64>, pub stream_max_retries: Option<u64>, pub stream_idle_timeout_ms: Option<u64>, pub requires_openai_auth: bool, pub supports_websockets: bool,}The env_http_headers field deserializes header values from environment variables at runtime — useful for dynamic tokens or org IDs.
Built-in Providers
Section titled “Built-in Providers”built_in_model_providers() (lines 271-292) ships three providers:
pub fn built_in_model_providers() -> HashMap<String, ModelProviderInfo> { [ ("openai", create_openai_provider()), ("ollama", create_oss_provider(DEFAULT_OLLAMA_PORT, WireApi::Responses)), ("lmstudio", create_oss_provider(DEFAULT_LMSTUDIO_PORT, WireApi::Responses)), ]}The OpenAI provider (lines 218-257) defaults to https://api.openai.com/v1, supports WebSockets, and reads org/project headers from environment variables. The OPENAI_BASE_URL env var overrides the base URL for custom deployments.
Custom providers are defined in ~/.codex/config.toml:
[provider.azure]name = "Azure"base_url = "https://xxxxx.openai.azure.com/openai"env_key = "AZURE_OPENAI_API_KEY"query_params = { api-version = "2025-04-01-preview" }Authentication
Section titled “Authentication”The AuthProvider trait in codex-api/src/auth.rs (lines 10-15) is minimal:
pub trait AuthProvider: Send + Sync { fn bearer_token(&self) -> Option<String>; fn account_id(&self) -> Option<String> { None }}add_auth_headers_to_header_map() (lines 17-28) injects Authorization: Bearer <token> and optional ChatGPT-Account-ID headers into every request. The trait is intentionally cheap and non-blocking — I/O happens at higher layers.
The CoreAuthProvider implementation in core/src/api_bridge.rs (lines 284-298) resolves auth with a three-level priority chain:
- Provider
env_key— API key from environment variable (e.g.,OPENAI_API_KEY) - Experimental bearer token — hardcoded token in provider config
- User auth — ChatGPT OAuth or API key from the
AuthManager
The AuthManager (lines 946-953) handles token lifecycle including refresh (via https://auth.openai.com/oauth/token), caching with an 8-day staleness threshold, and 401 recovery state machines.
Model Metadata
Section titled “Model Metadata”The ModelsClient in codex-api/src/endpoint/models.rs fetches model info from the /models endpoint with ETag-based caching. The ModelsManager in core/src/models_manager/manager.rs coordinates remote discovery with bundled metadata:
pub struct ModelsManager { local_models: Vec<ModelPreset>, remote_models: RwLock<Vec<ModelInfo>>, auth_manager: Arc<AuthManager>, etag: RwLock<Option<String>>, cache_manager: ModelsCacheManager, provider: ModelProviderInfo,}- Cache TTL: 5 minutes
- Network fetch timeout: 5 seconds
- Cache location:
~/.codex/models_cache.json - Refresh strategies:
Online,Offline,OnlineIfUncached
Model resolution uses longest-prefix matching — gpt-4.5 matches remote gpt-4 if no exact version exists. Unknown models get conservative fallback metadata: 272k context, 10k byte truncation, no reasoning support, with used_fallback_model_metadata: true flagged.
The ModelInfo struct in protocol/src/openai_models.rs (lines 217-261) carries rich metadata: context window, truncation policy, supported reasoning levels, shell tool type, visibility, parallel tool call support, input modalities (text/image), and reasoning summary support.
Retry Configuration
Section titled “Retry Configuration”pub struct RetryConfig { pub max_attempts: u64, pub base_delay: Duration, pub retry_429: bool, pub retry_5xx: bool, pub retry_transport: bool,}Retry applies to both unary and streaming calls. 429 (rate limit) and 5xx (server error) responses are retried with exponential backoff. Transport errors (timeout, network) are retried separately.
Responses API Client
Section titled “Responses API Client”The ResponsesClient in codex-api/src/endpoint/responses.rs handles streaming:
pub struct ResponsesClient<T: HttpTransport, A: AuthProvider> { session: EndpointSession<T, A>, sse_telemetry: Option<Arc<dyn SseTelemetry>>,}Streaming uses HTTP POST to /responses with Accept: text/event-stream, optional Zstd compression, conversation ID headers for multi-turn context, and idle timeout detection.
OpenCode Implementation
Section titled “OpenCode Implementation”Reference: references/opencode/packages/opencode/src/provider/ | Commit: 7ed4499
OpenCode builds its provider abstraction on the Vercel AI SDK, using their @ai-sdk/* family of provider packages. Each provider is an npm package that implements the LanguageModelV2 interface.
Provider Registry
Section titled “Provider Registry”provider/provider.ts (lines 84-107) defines the BUNDLED_PROVIDERS map:
const BUNDLED_PROVIDERS = { "@ai-sdk/openai": (opts) => createOpenAI(opts), "@ai-sdk/anthropic": (opts) => createAnthropic(opts), "@ai-sdk/google": (opts) => createGoogleGenerativeAI(opts), "@ai-sdk/google-vertex": (opts) => createVertex(opts), "@ai-sdk/azure": (opts) => createAzure(opts), "@ai-sdk/amazon-bedrock": (opts) => createAmazonBedrock(opts), "@ai-sdk/groq": (opts) => createGroq(opts), "@ai-sdk/mistral": (opts) => createMistral(opts), "@ai-sdk/xai": (opts) => createXai(opts), "@ai-sdk/cerebras": (opts) => createCerebras(opts), "@ai-sdk/cohere": (opts) => createCohere(opts), "@ai-sdk/deepinfra": (opts) => createDeepinfra(opts), "@ai-sdk/perplexity": (opts) => createPerplexity(opts), "@ai-sdk/togetherai": (opts) => createTogetherai(opts), "@ai-sdk/openai-compatible": (opts) => createOpenAICompatible(opts),}This gives OpenCode native support for 15+ providers out of the box, each with provider-specific optimizations.
Custom Provider Loaders
Section titled “Custom Provider Loaders”CUSTOM_LOADERS (lines 116-250) apply provider-specific initialization logic:
Anthropic (lines 117-126): Injects beta headers for extended features (prompt-caching-2024-07-31, pdfs-2024-09-25, interleaved-thinking-2025-05-14).
OpenAI (lines 150-157): Uses .responses() API instead of .chat() for newer models, calling getModel() to select the right protocol.
GitHub Copilot (lines 159-177): Routes GPT-5+ models to the Responses API via shouldUseCopilotResponsesApi() (regex check on model ID).
Azure (lines 179-191): Conditional routing between .responses() and .chat() based on model detection.
Model Configuration via models.dev
Section titled “Model Configuration via models.dev”OpenCode fetches model metadata from https://models.dev, an external model catalog:
const filepath = path.join(Global.Path.cache, "models.json")Models are cached locally as JSON. The ModelsDev namespace provides model discovery and configuration lookup, referenced from Config.state() in config/config.ts (line 7).
Provider-Specific Transforms
Section titled “Provider-Specific Transforms”provider/transform.ts handles provider-specific parameter adaptation:
Prompt caching varies by provider (lines around 778-809):
- Anthropic:
cacheControl: { type: "ephemeral" } - OpenAI:
cachePoint: { type: "default" } - Google/Copilot:
copilot_cache_control: { type: "ephemeral" }orcache_control: { type: "ephemeral" }
Small model options (smallOptions(), lines 778-809): Provider-aware cost reduction:
export function smallOptions(model: Provider.Model) { if (model.providerID === "openai") { if (model.api.id.includes("-codex")) { return { store: false, reasoningEffort: "low" } } return { store: false } } if (model.providerID === "google") { return { thinkingConfig: { thinkingLevel: "minimal" } } } // ... per-provider logic}Configuration Hierarchy
Section titled “Configuration Hierarchy”OpenCode loads provider/model config from multiple sources with cascading precedence (from config/config.ts, lines 68-193):
- Remote
.well-known/opencode(org defaults, lowest priority) - Global config (
~/.config/opencode/opencode.json{,c}) - Custom config path (
OPENCODE_CONFIGenv var) - Project config (
opencode.json{,c}viafindUp) .opencode/directory configs- Inline config (
OPENCODE_CONFIG_CONTENTenv var) - Managed config directory (enterprise, highest priority:
/etc/opencodeon Linux,/Library/Application Support/opencodeon macOS)
Each layer can define providers with custom API URLs, keys, and model overrides.
API Key Management
Section titled “API Key Management”API keys come from environment variables, resolved per provider. The Auth module provides Auth.all() for collecting all configured credentials, including .well-known/opencode token exchange for enterprise deployments.
Pitfalls & Hard Lessons
Section titled “Pitfalls & Hard Lessons”litellm version drift: Aider’s reliance on litellm means new provider support depends on litellm releases. When litellm introduces breaking changes (which happens regularly), Aider must pin versions and sometimes patch around regressions.
Provider-specific parameter leakage: The drop_params=True approach in Aider silently swallows parameters. This means you can accidentally send cache_control to a provider that doesn’t support it and never know it was dropped. Codex avoids this by only supporting one API shape (Responses API).
Token counting divergence: Each provider uses different tokenizers (cl100k_base for GPT, Claude’s proprietary tokenizer, etc.). Aider delegates to litellm.token_counter() which handles this, but the counts are approximate. Codex uses a simple 4-byte heuristic for pre-flight checks.
ETag caching failures: Codex’s model metadata cache depends on the /models endpoint returning proper ETags. If the server doesn’t support ETags, the cache becomes stale. The 5-minute TTL provides a safety net.
Rate limit header inconsistency: Different providers return rate limit information in different headers. OpenAI uses x-ratelimit-*, Anthropic uses their own format. Error classification must be provider-aware even when the abstraction hides provider identity.
Azure Responses API adoption: Azure’s OpenAI deployments lag behind the main OpenAI API. Both Codex and OpenCode must detect Azure endpoints and conditionally fall back from Responses API to Chat Completions.
OpenOxide Blueprint
Section titled “OpenOxide Blueprint”Crate: openoxide-provider
Section titled “Crate: openoxide-provider”Provider trait:
#[async_trait]pub trait LlmProvider: Send + Sync { async fn complete(&self, request: CompletionRequest) -> Result<CompletionStream>; fn model_info(&self) -> &ModelInfo; fn name(&self) -> &str;}Built-in providers: OpenAI (Responses API), Anthropic (Messages API), Ollama (local). Each implements LlmProvider directly — no litellm equivalent exists in the Rust ecosystem.
Provider registry: A HashMap<String, Box<dyn LlmProvider>> keyed by provider name, populated from config and CLI flags. Model name prefixes (anthropic/, ollama/) route to the right provider.
Model metadata: Ship a bundled models.json compiled into the binary via include_str!(). Supplement with optional remote fetch (OpenAI /models endpoint) cached at ~/.openoxide/cache/models.json with 5-minute TTL and ETag support.
Auth: Use the keyring crate for OS credential storage (macOS Keychain, Linux Secret Service, Windows Credential Manager). Fall back to environment variables. Support OPENAI_API_KEY, ANTHROPIC_API_KEY, and custom env_key per provider in config.
Config: Provider definitions in ~/.openoxide/config.toml:
[provider.azure]name = "Azure OpenAI"base_url = "https://my-instance.openai.azure.com"api_key_env = "AZURE_OPENAI_API_KEY"wire_api = "responses"Retry: Per-provider RetryConfig with exponential backoff for 429 and 5xx. Use tokio-retry or a custom retry loop.
Streaming: All providers return Pin<Box<dyn Stream<Item = Result<StreamEvent>>>>. The stream abstraction normalizes provider-specific SSE formats into a common StreamEvent enum (text delta, tool call, reasoning, done).
Token counting: Use tiktoken-rs for OpenAI models. For Anthropic and others, use character-count heuristics (chars/4) for pre-flight budget checks, with actual token counts from API response usage fields for post-hoc tracking.