Skip to content

CLI Interface Design

The command-line interface is the primary surface through which users interact with an AI coding agent. It’s also the first place architectural decisions become visible. Every flag, subcommand, and option group is a commitment: it reveals what the tool considers important, how it thinks about workflow, and what internal subsystems exist. A well-designed CLI surface isn’t just UX — it’s a contract that shapes the internal module structure.

The hard problems in CLI design for an AI coding agent are:

  • Headless vs interactive duality: Users need both a REPL-style interactive session and a scriptable one-shot mode for CI pipelines, editor integrations, and automation. These modes share state machinery but differ in how they emit output, handle input, and manage lifecycle.
  • Server/client separation: Modern AI tools run a background server and attach multiple clients (TUI, web, IDE plugin). The CLI must expose this topology: commands to start a server, commands to attach to a running one, and commands to query server state without a full UI.
  • Session continuity: Sessions aren’t ephemeral. Users need to resume, fork, and export them. The CLI surface for session management defines how the persistence layer must work.
  • Sandboxing controls: When an AI agent can execute shell commands, users need explicit, visible controls over what it’s allowed to do. These can’t be buried in a config file — they must be first-class CLI flags with clear semantics.
  • Provider and model abstraction: Multiple LLM providers, local models, and fallback chains all need to be selectable at the command line without a config file edit, while still respecting layered config.
  • Debugging and introspection: Users diagnosing unexpected behavior need to inspect internal state (resolved config, LSP diagnostics, snapshot history, ripgrep file lists) without reading source code.

Aider uses a flat argparse-based CLI defined in references/aider/aider/args.py. There are no subcommands — every option is a flag on the single aider command. Argument groups organize the flags conceptually but don’t create any subcommand hierarchy. This is a deliberate single-process design: Aider doesn’t have a background server.

Key argument groups and what they reveal:

Main model / API Keys--model, --openai-api-key, --anthropic-api-key, --api-key, --openai-api-base, --openai-api-type, --set-env. Every supported provider is explicit. The --api-key flag is the generic form: --api-key provider=keyvalue.

Model settings--edit-format, --chat-mode, --architect, --weak-model, --editor-model. This group is where the multi-model architecture surfaces. Aider has named edit formats (diff, udiff, whole, search-replace, editor-diff, editor-whole) that correspond to different coder classes in aider/coders/. The --architect flag activates a two-step flow where one model plans and a separate editor model executes.

  • --reasoning-effort / --thinking-tokens — expose model-level thinking controls.
  • --max-chat-history-tokens — sets the token budget for rolling chat history before summary/truncation.
  • --cache-prompts / --cache-keepalive-pings — prompt caching controls for providers that support it.

Repomap settings--map-tokens, --map-refresh, --map-multiplier-no-files. Direct control over the repo mapping subsystem. --map-tokens is the budget; --map-refresh controls when the map is rebuilt (auto, always, files, manual); --map-multiplier-no-files scales the budget up when no files are in context (more map headroom needed).

Git settings--auto-commits, --dirty-commits, --attribute-author, --attribute-committer, --attribute-commit-message-author, --commit-prompt, --dry-run, --watch-files. Aider’s git integration is deep: it auto-commits after every accepted change, attributes commits to both the user and the AI, and can watch the filesystem for external changes to pull into context.

Fixing and committing--lint, --lint-cmd, --auto-lint, --test-cmd, --auto-test, --test. Built-in lint and test loop. --auto-lint (default: on) runs configured linters after every edit. --auto-test (default: off) runs tests. Both feed failures back to the model for another pass.

Modes--message/-m, --message-file/-f, --gui, --apply, --show-repo-map, --show-prompts. This group is what makes Aider scriptable. --message sends a single prompt and exits — headless one-shot mode. --show-repo-map and --show-prompts are debug inspection flags that print internal state and exit; useful for understanding what the model actually sees.

Voice settings--voice-format (wav|mp3|webm), --voice-language. Aider supports voice input via the microphone.

The flat structure is Aider’s main constraint: adding a new major feature requires adding more global flags, which creates a long help page with everything visible at once. There’s no way to hide experimental or advanced functionality in a subcommand tree. The upside is zero discovery friction — aider --help shows everything.


Codex uses clap in Rust, parsed in codex-rs/cli/src/main.rs. It has a full subcommand tree with deep nesting and a global config override system.

The -c, --config <key=value> pattern is the most important design decision in Codex’s CLI. Every setting in ~/.codex/config.toml is overridable at the command line using TOML dotted path syntax: -c model="o3", -c 'sandbox_permissions=["disk-full-read-access"]', -c shell_environment_policy.inherit=all. If TOML parse fails, the raw string is used as a literal fallback. This means there’s no special-case argument for every config key — just one escape hatch that covers everything.

Feature flags as first-class CLI--enable <FEATURE> and --disable <FEATURE> are available on every command. They map to -c features.<name>=true/false. Features have a “stage” (alpha/beta/stable) and can be inspected with codex features list.

Sandbox as non-negotiable surface: Three modes are exposed directly as CLI flags, not buried in config:

  • read-only — agent can only read.
  • workspace-write — agent can write inside the project.
  • danger-full-access — no isolation.

These map to OS-level enforcement: Landlock+seccomp on Linux, Seatbelt on macOS, restricted token on Windows. The codex sandbox <platform> subcommand lets users test arbitrary commands inside these isolation layers without starting an agent session.

Approval policy as a separate axis from sandbox:

  • untrusted — run only “safe” commands (ls, cat, sed) without asking; escalate otherwise.
  • on-failure — run everything, ask if a command fails.
  • on-request — model decides when to ask.
  • never — never ask; failures go straight back to the model.

--full-auto combines on-request approval + workspace-write sandbox. --dangerously-bypass-approvals-and-sandbox is for CI environments that are externally sandboxed.

Non-interactive exec mode:

codex exec [PROMPT]
--json JSONL event stream (machine-readable)
--output-schema <FILE> JSON Schema for structured final response
--ephemeral don't persist session to disk
--skip-git-repo-check run outside a git repo
-o, --output-last-message write final agent message to file

The --json flag emits the same internal event stream the TUI renders, but as JSONL to stdout. --output-schema forces structured JSON output from the model via a JSON Schema constraint — the model’s final response must conform or it retries.

Review mode as a first-class command: codex review is not a prompt template — it’s a distinct CLI entry that accepts --uncommitted (staged+unstaged+untracked), --base <BRANCH> (full branch diff), or --commit <SHA> (single commit review). This is a separate agent mode with different context construction.

Session management:

  • codex resume [SESSION_ID] — resume interactive session; --last skips picker; --all shows all sessions regardless of CWD.
  • codex fork [SESSION_ID] — branch a new session from a prior one’s state; same flags.
  • codex apply <TASK_ID> — apply the latest diff from an agent session as git apply to the working tree. This is the “bring cloud changes local” primitive.

App server / IDE integration:

codex app-server --listen stdio:// | ws://IP:PORT
codex app-server generate-ts # TypeScript bindings for the protocol
codex app-server generate-json-schema # JSON Schema for the protocol

The generate-ts and generate-json-schema subcommands produce typed protocol definitions from the Rust types — they exist specifically to keep the VSCode extension in sync. --analytics-default-enabled is a flag for first-party integrators (e.g., the VSCode extension) to opt into analytics by default.

Codex as MCP server: codex mcp-server runs Codex itself on stdio as an MCP server. Other tools (editors, agents) can call Codex as a tool provider. This is the inverse of Codex managing external MCP servers.

--no-alt-screen: Runs the TUI in inline mode, preserving scrollback history instead of using an alternate screen buffer. Necessary in Zellij, which follows strict xterm semantics and disables scrollback in alternate screen buffers.


OpenCode uses yargs (TypeScript), parsed in packages/opencode/src/index.ts. The command tree is wide and shallow: most commands are top-level, with only mcp, debug, auth, agent, session, and github having subcommands.

Every mode is a client to the same server: opencode tui, opencode run, opencode attach, and opencode web all connect to the same Hono HTTP/SSE/WebSocket server. The TUI starts the server internally; opencode serve starts it without a TUI; opencode attach connects to a server that’s already running somewhere. The --port, --hostname, --cors, and --mdns flags appear on every server-starting command because they control that embedded server.

mDNS for LAN discovery: --mdns broadcasts the server on the local network as opencode.local (or a custom --mdns-domain). This lets the opencode attach command find a running instance without hardcoding a port.

The run command as the headless interface:

opencode run [message..]
--format default | json # json = raw JSONL event stream
--file # attach files
--title # session title
--attach <url> # attach to existing server instead of starting one
--variant # model reasoning variant (high, max, minimal)
--thinking # show thinking blocks in output

--format json emits the same SSE event bus data the TUI consumes, as JSONL. This is the machine-readable interface for CI, scripts, and editor plugins.

ACP (Agent Client Protocol): opencode acp starts an ACP server — a distinct protocol layer separate from the main HTTP/SSE server. ACP adds --cwd as a flag (sets the working directory for the ACP server instance), which the main commands don’t expose because they derive CWD from the project context.

MCP with OAuth lifecycle: The mcp subcommand tree exposes that OpenCode handles the full OAuth flow for OAuth-enabled MCP servers:

opencode mcp add # add a server (stdio or remote)
opencode mcp list # list servers + connection status
opencode mcp auth [name] # run OAuth login for a server
opencode mcp auth list # list OAuth-capable servers + auth state
opencode mcp logout [name] # remove OAuth tokens
opencode mcp debug <name> # debug OAuth connection issues

This is not just stdio MCP management — it includes remote StreamableHTTP/SSE transports with OAuth PKCE flows.

Debug tree mirrors internal module boundaries exactly:

opencode debug config # resolved config (all layers merged)
opencode debug paths # data/config/cache/state global paths
opencode debug scrap # list all known projects
opencode debug skill # list all available skills
opencode debug lsp ... # LSP diagnostics, workspace symbols, document symbols
opencode debug rg ... # ripgrep tree, files list, pattern search
opencode debug file ... # VFS read, status, list, search, tree
opencode debug snapshot ... # snapshot track, patch, diff
opencode debug agent <name> # show agent config details
opencode debug wait # wait indefinitely (for process attachment)

The lsp, rg, file, snapshot, agent, and skill namespaces directly correspond to the internal src/lsp/, src/tool/ (ripgrep), src/snapshot/, and src/agent/ modules. This makes the debug tree the clearest map of OpenCode’s internal subsystem boundaries.

Stats with project and model filters:

opencode stats
--days N last N days (default: all time)
--tools N top N tools (default: all)
--models [N] show model stats (default: hidden)
--project <name> filter by project; empty string = current project

Persistent token/cost tracking per project and model, not just session.

Session import/export: opencode export [sessionID] dumps to JSON; opencode import <file or URL> restores from JSON or a share URL. The share URL format implies a hosted sharing service built into the protocol.

GitHub agent: opencode github install / opencode github run — a named agent configuration specifically for GitHub PR-level automation. opencode pr <number> is a convenience that fetches a PR branch and launches the TUI pointed at it.

Auth with provider URL: opencode auth login [url] — the optional url argument supports custom provider endpoints (useful for self-hosted Anthropic proxies, corporate gateways, or the OpenCode hosted service).


Flat CLI doesn’t scale: Aider’s single-level argparse produces a help page with ~100 flags organized only by comment groups. Once feature count crosses ~30 significant knobs, discoverability collapses. Users learn flags from docs or other users, not --help.

Subcommand nesting has its own tax: Codex’s codex exec resume and codex app-server generate-ts are three-level commands. Most shells have good completion support, but discoverability still requires codex exec --help to find what exec accepts. There’s a friction cliff at depth 3.

Headless mode must emit structured events from day one: Both Codex and OpenCode expose a --json / --format json flag that emits the internal event stream. Adding this retroactively means the format wasn’t designed for machines — it was designed for terminals and then bolted on. OpenOxide should design the event format first, then render it for the terminal.

Sandbox mode must be unhideable: Codex’s explicit -s, --sandbox <MODE> flag that shows on every command makes the security posture visible. Tools that bury sandbox config in a settings file get security surprises when users run in unexpected environments (CI, shared machines, remote sessions).

Config override at the CLI is mandatory: Power users always need to change a single setting without editing a file. Codex’s -c key=value pattern (TOML dotted path) handles every case with one mechanism. OpenCode uses per-command yargs flags that must be kept in sync with the config schema manually.

Session picker UX: Both Codex (codex resume) and OpenCode (opencode session list) have session management. The --last shortcut in Codex is heavily used in practice — the picker is too many keystrokes for the common case of “just continue where I left off.”

mDNS is fragile on corporate networks: OpenCode’s --mdns discovery relies on multicast DNS, which is often blocked by enterprise network policies, VPNs, and Docker’s default bridge networking. It’s a good feature on a home LAN; it silently fails in most professional environments. Any network discovery system needs a fallback (explicit URL, service file, port file).

Feature flag proliferation: Codex’s features list command shows features with stage labels (alpha/beta/stable). Without visible stage labels, users can’t distinguish “this flag is stable” from “this flag may disappear next release.” The stage label system should be part of the CLI display, not just internal metadata.


OpenOxide’s CLI will use clap (Rust) with the derive macro pattern for compile-time checked argument definitions. The structure should prioritize scriptability and composability over interactive UX — the TUI handles interactive concerns; the CLI handles automation.

[dependencies]
clap = { version = "4", features = ["derive", "env", "color"] }
openoxide [OPTIONS] [PROMPT] # start interactive TUI (default)
openoxide serve # headless server only
openoxide run [message..] # non-interactive single-turn
--format text|json # json = JSONL event stream
--session <ID> # continue a specific session
--file <FILE>... # attach files
openoxide attach <url> # connect to running server
openoxide session # session management
openoxide session list # list sessions
openoxide session resume [ID] # resume (--last shortcut)
openoxide session fork [ID] # fork
openoxide session export [ID] # dump JSON
openoxide session import <file> # restore from JSON
openoxide mcp # MCP server management
openoxide mcp add # add server
openoxide mcp list # list servers + status
openoxide mcp remove <name> # remove server
openoxide auth # provider credentials
openoxide auth login # interactive login
openoxide auth logout # remove credentials
openoxide auth list # list configured providers
openoxide models [provider] # list available models
openoxide stats # token + cost statistics
openoxide debug # internal diagnostics
openoxide debug config # resolved config dump
openoxide debug paths # data/config/cache locations
openoxide debug snapshot <hash> # inspect snapshot state
openoxide debug lsp <file> # LSP diagnostics for a file
openoxide debug map # print repo map and exit
openoxide debug prompts # print system prompt and exit
openoxide upgrade # self-update
openoxide completion <shell> # shell completions

Use Codex’s -c key=value pattern for config overrides. Don’t replicate every config field as a CLI flag. One generic override mechanism handles power users; named flags handle the common cases (model, sandbox, session).

/// Override config key (TOML dotted path format, e.g. -c model="claude-opus-4")
#[arg(short = 'c', long = "config", value_name = "key=value")]
pub config_overrides: Vec<String>,

Sandbox as a mandatory visible flag. Every command that can execute shell actions gets -s, --sandbox:

#[arg(short = 's', long, default_value = "workspace-write")]
pub sandbox: SandboxMode,
pub enum SandboxMode {
ReadOnly,
WorkspaceWrite,
DangerFullAccess,
}

Map WorkspaceWrite to Landlock on Linux (following Codex’s codex-rs/linux-sandbox/ model), Seatbelt on macOS, and file ACLs or AppContainer on Windows.

JSON event output from the first line of code. The internal event bus emits typed events. The --format json flag on run is not a hack — it’s the primary machine interface. Design event types as serde-serializable enums first:

#[derive(Serialize)]
#[serde(tag = "type")]
pub enum AgentEvent {
MessageStart { session_id: String },
TextDelta { delta: String },
ToolCall { id: String, name: String, input: Value },
ToolResult { id: String, output: String },
TokenUsage { input: u32, output: u32, cost_usd: f64 },
Done,
}

Debug subcommands as module health checks. Each internal subsystem gets a debug subcommand that directly queries its state without a full agent session. debug lsp <file> calls into crate::lsp directly. debug map runs the repo mapper and prints output. This is the equivalent of OpenCode’s opencode debug lsp diagnostics <file>.

Feature flags with stage labels visible in CLI. Use a features list equivalent that shows:

FEATURE STAGE ENABLED DESCRIPTION
streaming-edits stable true Stream file edits as patches
tree-sitter-map beta false Use tree-sitter for repo mapping
lsp-inline alpha false Inline LSP completions

Stage labels prevent users from relying on alpha features in automation.

--last shortcut for session operations. The most common case (continue last session) should be one flag:

openoxide session resume --last
openoxide --continue # global alias for the same thing
  • clap with derive feature for the full command tree.
  • directories for XDG-compliant data/config/cache paths.
  • serde + serde_json for JSON event serialization.
  • tokio for the async server runtime that openoxide serve starts.