CLI Interface Design
Feature Definition
Section titled “Feature Definition”The command-line interface is the primary surface through which users interact with an AI coding agent. It’s also the first place architectural decisions become visible. Every flag, subcommand, and option group is a commitment: it reveals what the tool considers important, how it thinks about workflow, and what internal subsystems exist. A well-designed CLI surface isn’t just UX — it’s a contract that shapes the internal module structure.
The hard problems in CLI design for an AI coding agent are:
- Headless vs interactive duality: Users need both a REPL-style interactive session and a scriptable one-shot mode for CI pipelines, editor integrations, and automation. These modes share state machinery but differ in how they emit output, handle input, and manage lifecycle.
- Server/client separation: Modern AI tools run a background server and attach multiple clients (TUI, web, IDE plugin). The CLI must expose this topology: commands to start a server, commands to attach to a running one, and commands to query server state without a full UI.
- Session continuity: Sessions aren’t ephemeral. Users need to resume, fork, and export them. The CLI surface for session management defines how the persistence layer must work.
- Sandboxing controls: When an AI agent can execute shell commands, users need explicit, visible controls over what it’s allowed to do. These can’t be buried in a config file — they must be first-class CLI flags with clear semantics.
- Provider and model abstraction: Multiple LLM providers, local models, and fallback chains all need to be selectable at the command line without a config file edit, while still respecting layered config.
- Debugging and introspection: Users diagnosing unexpected behavior need to inspect internal state (resolved config, LSP diagnostics, snapshot history, ripgrep file lists) without reading source code.
Aider Implementation
Section titled “Aider Implementation”Aider uses a flat argparse-based CLI defined in references/aider/aider/args.py. There are no subcommands — every option is a flag on the single aider command. Argument groups organize the flags conceptually but don’t create any subcommand hierarchy. This is a deliberate single-process design: Aider doesn’t have a background server.
Key argument groups and what they reveal:
Main model / API Keys — --model, --openai-api-key, --anthropic-api-key, --api-key, --openai-api-base, --openai-api-type, --set-env. Every supported provider is explicit. The --api-key flag is the generic form: --api-key provider=keyvalue.
Model settings — --edit-format, --chat-mode, --architect, --weak-model, --editor-model. This group is where the multi-model architecture surfaces. Aider has named edit formats (diff, udiff, whole, search-replace, editor-diff, editor-whole) that correspond to different coder classes in aider/coders/. The --architect flag activates a two-step flow where one model plans and a separate editor model executes.
--reasoning-effort/--thinking-tokens— expose model-level thinking controls.--max-chat-history-tokens— sets the token budget for rolling chat history before summary/truncation.--cache-prompts/--cache-keepalive-pings— prompt caching controls for providers that support it.
Repomap settings — --map-tokens, --map-refresh, --map-multiplier-no-files. Direct control over the repo mapping subsystem. --map-tokens is the budget; --map-refresh controls when the map is rebuilt (auto, always, files, manual); --map-multiplier-no-files scales the budget up when no files are in context (more map headroom needed).
Git settings — --auto-commits, --dirty-commits, --attribute-author, --attribute-committer, --attribute-commit-message-author, --commit-prompt, --dry-run, --watch-files. Aider’s git integration is deep: it auto-commits after every accepted change, attributes commits to both the user and the AI, and can watch the filesystem for external changes to pull into context.
Fixing and committing — --lint, --lint-cmd, --auto-lint, --test-cmd, --auto-test, --test. Built-in lint and test loop. --auto-lint (default: on) runs configured linters after every edit. --auto-test (default: off) runs tests. Both feed failures back to the model for another pass.
Modes — --message/-m, --message-file/-f, --gui, --apply, --show-repo-map, --show-prompts. This group is what makes Aider scriptable. --message sends a single prompt and exits — headless one-shot mode. --show-repo-map and --show-prompts are debug inspection flags that print internal state and exit; useful for understanding what the model actually sees.
Voice settings — --voice-format (wav|mp3|webm), --voice-language. Aider supports voice input via the microphone.
The flat structure is Aider’s main constraint: adding a new major feature requires adding more global flags, which creates a long help page with everything visible at once. There’s no way to hide experimental or advanced functionality in a subcommand tree. The upside is zero discovery friction — aider --help shows everything.
Codex Implementation
Section titled “Codex Implementation”Codex uses clap in Rust, parsed in codex-rs/cli/src/main.rs. It has a full subcommand tree with deep nesting and a global config override system.
Architecture Revealed by the CLI
Section titled “Architecture Revealed by the CLI”The -c, --config <key=value> pattern is the most important design decision in Codex’s CLI. Every setting in ~/.codex/config.toml is overridable at the command line using TOML dotted path syntax: -c model="o3", -c 'sandbox_permissions=["disk-full-read-access"]', -c shell_environment_policy.inherit=all. If TOML parse fails, the raw string is used as a literal fallback. This means there’s no special-case argument for every config key — just one escape hatch that covers everything.
Feature flags as first-class CLI — --enable <FEATURE> and --disable <FEATURE> are available on every command. They map to -c features.<name>=true/false. Features have a “stage” (alpha/beta/stable) and can be inspected with codex features list.
Sandbox as non-negotiable surface: Three modes are exposed directly as CLI flags, not buried in config:
read-only— agent can only read.workspace-write— agent can write inside the project.danger-full-access— no isolation.
These map to OS-level enforcement: Landlock+seccomp on Linux, Seatbelt on macOS, restricted token on Windows. The codex sandbox <platform> subcommand lets users test arbitrary commands inside these isolation layers without starting an agent session.
Approval policy as a separate axis from sandbox:
untrusted— run only “safe” commands (ls, cat, sed) without asking; escalate otherwise.on-failure— run everything, ask if a command fails.on-request— model decides when to ask.never— never ask; failures go straight back to the model.
--full-auto combines on-request approval + workspace-write sandbox. --dangerously-bypass-approvals-and-sandbox is for CI environments that are externally sandboxed.
Non-interactive exec mode:
codex exec [PROMPT] --json JSONL event stream (machine-readable) --output-schema <FILE> JSON Schema for structured final response --ephemeral don't persist session to disk --skip-git-repo-check run outside a git repo -o, --output-last-message write final agent message to fileThe --json flag emits the same internal event stream the TUI renders, but as JSONL to stdout. --output-schema forces structured JSON output from the model via a JSON Schema constraint — the model’s final response must conform or it retries.
Review mode as a first-class command: codex review is not a prompt template — it’s a distinct CLI entry that accepts --uncommitted (staged+unstaged+untracked), --base <BRANCH> (full branch diff), or --commit <SHA> (single commit review). This is a separate agent mode with different context construction.
Session management:
codex resume [SESSION_ID]— resume interactive session;--lastskips picker;--allshows all sessions regardless of CWD.codex fork [SESSION_ID]— branch a new session from a prior one’s state; same flags.codex apply <TASK_ID>— apply the latest diff from an agent session asgit applyto the working tree. This is the “bring cloud changes local” primitive.
App server / IDE integration:
codex app-server --listen stdio:// | ws://IP:PORT codex app-server generate-ts # TypeScript bindings for the protocol codex app-server generate-json-schema # JSON Schema for the protocolThe generate-ts and generate-json-schema subcommands produce typed protocol definitions from the Rust types — they exist specifically to keep the VSCode extension in sync. --analytics-default-enabled is a flag for first-party integrators (e.g., the VSCode extension) to opt into analytics by default.
Codex as MCP server: codex mcp-server runs Codex itself on stdio as an MCP server. Other tools (editors, agents) can call Codex as a tool provider. This is the inverse of Codex managing external MCP servers.
--no-alt-screen: Runs the TUI in inline mode, preserving scrollback history instead of using an alternate screen buffer. Necessary in Zellij, which follows strict xterm semantics and disables scrollback in alternate screen buffers.
OpenCode Implementation
Section titled “OpenCode Implementation”OpenCode uses yargs (TypeScript), parsed in packages/opencode/src/index.ts. The command tree is wide and shallow: most commands are top-level, with only mcp, debug, auth, agent, session, and github having subcommands.
Architecture Revealed by the CLI
Section titled “Architecture Revealed by the CLI”Every mode is a client to the same server: opencode tui, opencode run, opencode attach, and opencode web all connect to the same Hono HTTP/SSE/WebSocket server. The TUI starts the server internally; opencode serve starts it without a TUI; opencode attach connects to a server that’s already running somewhere. The --port, --hostname, --cors, and --mdns flags appear on every server-starting command because they control that embedded server.
mDNS for LAN discovery: --mdns broadcasts the server on the local network as opencode.local (or a custom --mdns-domain). This lets the opencode attach command find a running instance without hardcoding a port.
The run command as the headless interface:
opencode run [message..] --format default | json # json = raw JSONL event stream --file # attach files --title # session title --attach <url> # attach to existing server instead of starting one --variant # model reasoning variant (high, max, minimal) --thinking # show thinking blocks in output--format json emits the same SSE event bus data the TUI consumes, as JSONL. This is the machine-readable interface for CI, scripts, and editor plugins.
ACP (Agent Client Protocol): opencode acp starts an ACP server — a distinct protocol layer separate from the main HTTP/SSE server. ACP adds --cwd as a flag (sets the working directory for the ACP server instance), which the main commands don’t expose because they derive CWD from the project context.
MCP with OAuth lifecycle: The mcp subcommand tree exposes that OpenCode handles the full OAuth flow for OAuth-enabled MCP servers:
opencode mcp add # add a server (stdio or remote)opencode mcp list # list servers + connection statusopencode mcp auth [name] # run OAuth login for a server opencode mcp auth list # list OAuth-capable servers + auth stateopencode mcp logout [name] # remove OAuth tokensopencode mcp debug <name> # debug OAuth connection issuesThis is not just stdio MCP management — it includes remote StreamableHTTP/SSE transports with OAuth PKCE flows.
Debug tree mirrors internal module boundaries exactly:
opencode debug config # resolved config (all layers merged)opencode debug paths # data/config/cache/state global pathsopencode debug scrap # list all known projectsopencode debug skill # list all available skillsopencode debug lsp ... # LSP diagnostics, workspace symbols, document symbolsopencode debug rg ... # ripgrep tree, files list, pattern searchopencode debug file ... # VFS read, status, list, search, treeopencode debug snapshot ... # snapshot track, patch, diffopencode debug agent <name> # show agent config detailsopencode debug wait # wait indefinitely (for process attachment)The lsp, rg, file, snapshot, agent, and skill namespaces directly correspond to the internal src/lsp/, src/tool/ (ripgrep), src/snapshot/, and src/agent/ modules. This makes the debug tree the clearest map of OpenCode’s internal subsystem boundaries.
Stats with project and model filters:
opencode stats --days N last N days (default: all time) --tools N top N tools (default: all) --models [N] show model stats (default: hidden) --project <name> filter by project; empty string = current projectPersistent token/cost tracking per project and model, not just session.
Session import/export: opencode export [sessionID] dumps to JSON; opencode import <file or URL> restores from JSON or a share URL. The share URL format implies a hosted sharing service built into the protocol.
GitHub agent: opencode github install / opencode github run — a named agent configuration specifically for GitHub PR-level automation. opencode pr <number> is a convenience that fetches a PR branch and launches the TUI pointed at it.
Auth with provider URL: opencode auth login [url] — the optional url argument supports custom provider endpoints (useful for self-hosted Anthropic proxies, corporate gateways, or the OpenCode hosted service).
Pitfalls and Hard Lessons
Section titled “Pitfalls and Hard Lessons”Flat CLI doesn’t scale: Aider’s single-level argparse produces a help page with ~100 flags organized only by comment groups. Once feature count crosses ~30 significant knobs, discoverability collapses. Users learn flags from docs or other users, not --help.
Subcommand nesting has its own tax: Codex’s codex exec resume and codex app-server generate-ts are three-level commands. Most shells have good completion support, but discoverability still requires codex exec --help to find what exec accepts. There’s a friction cliff at depth 3.
Headless mode must emit structured events from day one: Both Codex and OpenCode expose a --json / --format json flag that emits the internal event stream. Adding this retroactively means the format wasn’t designed for machines — it was designed for terminals and then bolted on. OpenOxide should design the event format first, then render it for the terminal.
Sandbox mode must be unhideable: Codex’s explicit -s, --sandbox <MODE> flag that shows on every command makes the security posture visible. Tools that bury sandbox config in a settings file get security surprises when users run in unexpected environments (CI, shared machines, remote sessions).
Config override at the CLI is mandatory: Power users always need to change a single setting without editing a file. Codex’s -c key=value pattern (TOML dotted path) handles every case with one mechanism. OpenCode uses per-command yargs flags that must be kept in sync with the config schema manually.
Session picker UX: Both Codex (codex resume) and OpenCode (opencode session list) have session management. The --last shortcut in Codex is heavily used in practice — the picker is too many keystrokes for the common case of “just continue where I left off.”
mDNS is fragile on corporate networks: OpenCode’s --mdns discovery relies on multicast DNS, which is often blocked by enterprise network policies, VPNs, and Docker’s default bridge networking. It’s a good feature on a home LAN; it silently fails in most professional environments. Any network discovery system needs a fallback (explicit URL, service file, port file).
Feature flag proliferation: Codex’s features list command shows features with stage labels (alpha/beta/stable). Without visible stage labels, users can’t distinguish “this flag is stable” from “this flag may disappear next release.” The stage label system should be part of the CLI display, not just internal metadata.
OpenOxide Blueprint
Section titled “OpenOxide Blueprint”OpenOxide’s CLI will use clap (Rust) with the derive macro pattern for compile-time checked argument definitions. The structure should prioritize scriptability and composability over interactive UX — the TUI handles interactive concerns; the CLI handles automation.
[dependencies]clap = { version = "4", features = ["derive", "env", "color"] }Subcommand Tree (Proposed)
Section titled “Subcommand Tree (Proposed)”openoxide [OPTIONS] [PROMPT] # start interactive TUI (default)openoxide serve # headless server onlyopenoxide run [message..] # non-interactive single-turn --format text|json # json = JSONL event stream --session <ID> # continue a specific session --file <FILE>... # attach filesopenoxide attach <url> # connect to running serveropenoxide session # session management openoxide session list # list sessions openoxide session resume [ID] # resume (--last shortcut) openoxide session fork [ID] # fork openoxide session export [ID] # dump JSON openoxide session import <file> # restore from JSONopenoxide mcp # MCP server management openoxide mcp add # add server openoxide mcp list # list servers + status openoxide mcp remove <name> # remove serveropenoxide auth # provider credentials openoxide auth login # interactive login openoxide auth logout # remove credentials openoxide auth list # list configured providersopenoxide models [provider] # list available modelsopenoxide stats # token + cost statisticsopenoxide debug # internal diagnostics openoxide debug config # resolved config dump openoxide debug paths # data/config/cache locations openoxide debug snapshot <hash> # inspect snapshot state openoxide debug lsp <file> # LSP diagnostics for a file openoxide debug map # print repo map and exit openoxide debug prompts # print system prompt and exitopenoxide upgrade # self-updateopenoxide completion <shell> # shell completionsKey Design Decisions
Section titled “Key Design Decisions”Use Codex’s -c key=value pattern for config overrides. Don’t replicate every config field as a CLI flag. One generic override mechanism handles power users; named flags handle the common cases (model, sandbox, session).
/// Override config key (TOML dotted path format, e.g. -c model="claude-opus-4")#[arg(short = 'c', long = "config", value_name = "key=value")]pub config_overrides: Vec<String>,Sandbox as a mandatory visible flag. Every command that can execute shell actions gets -s, --sandbox:
#[arg(short = 's', long, default_value = "workspace-write")]pub sandbox: SandboxMode,
pub enum SandboxMode { ReadOnly, WorkspaceWrite, DangerFullAccess,}Map WorkspaceWrite to Landlock on Linux (following Codex’s codex-rs/linux-sandbox/ model), Seatbelt on macOS, and file ACLs or AppContainer on Windows.
JSON event output from the first line of code. The internal event bus emits typed events. The --format json flag on run is not a hack — it’s the primary machine interface. Design event types as serde-serializable enums first:
#[derive(Serialize)]#[serde(tag = "type")]pub enum AgentEvent { MessageStart { session_id: String }, TextDelta { delta: String }, ToolCall { id: String, name: String, input: Value }, ToolResult { id: String, output: String }, TokenUsage { input: u32, output: u32, cost_usd: f64 }, Done,}Debug subcommands as module health checks. Each internal subsystem gets a debug subcommand that directly queries its state without a full agent session. debug lsp <file> calls into crate::lsp directly. debug map runs the repo mapper and prints output. This is the equivalent of OpenCode’s opencode debug lsp diagnostics <file>.
Feature flags with stage labels visible in CLI. Use a features list equivalent that shows:
FEATURE STAGE ENABLED DESCRIPTIONstreaming-edits stable true Stream file edits as patchestree-sitter-map beta false Use tree-sitter for repo mappinglsp-inline alpha false Inline LSP completionsStage labels prevent users from relying on alpha features in automation.
--last shortcut for session operations. The most common case (continue last session) should be one flag:
openoxide session resume --lastopenoxide --continue # global alias for the same thingCrates for Session/State
Section titled “Crates for Session/State”clapwithderivefeature for the full command tree.directoriesfor XDG-compliant data/config/cache paths.serde+serde_jsonfor JSON event serialization.tokiofor the async server runtime thatopenoxide servestarts.