Skip to content

Command Execution

Source attribution: Implementation details traced from references/aider/ at commit b9050e1d, references/codex/ at commit 4ab44e2c5, and references/opencode/ at commit 7ed4499.

Every AI coding agent needs to run shell commands — tests, linters, build tools, git operations, arbitrary user commands. This sounds simple but production implementations reveal deep complexity:

  1. Output capture vs. pass-through: The agent needs command output to feed back to the LLM, but the user also wants to see it in real time. Interactive commands (those expecting a TTY) break when stdout is piped.
  2. Timeout management: A while true loop or a hung network request can block the agent forever. Timeouts must kill the process and all its children, not just the parent shell.
  3. Output truncation: A find / command can produce gigabytes of output. The system must cap output to prevent OOM while preserving enough context for the LLM to act on errors.
  4. Environment isolation: Commands should run in a controlled environment. Leaking OPENAI_API_KEY into a subprocess that runs user code is a security risk.
  5. Signal handling: When the agent process dies, child processes must be cleaned up. Orphaned processes waste resources and can hold file locks.

The tension is between safety (strict timeouts, sandboxed environment, output caps) and usability (users expect commands to “just work” like in a normal terminal).


Reference: references/aider/aider/run_cmd.py, aider/commands.py | Commit: b9050e1d

Aider uses a dual-mode execution strategy — PTY mode for interactive terminals, subprocess mode for everything else.

run_cmd_pexpect() in aider/run_cmd.py (lines 89-132) uses the pexpect library for TTY-aware execution:

def run_cmd_pexpect(cmd, shell, encoding):
child = pexpect.spawn(shell, args=["-i", "-c", cmd])
output = []
child.interact(output_filter=lambda data: output_callback(data, output))
child.close()
return child.exitstatus, "".join(output)

This mode is only available on Unix (non-Windows) when stdin is a TTY. It preserves terminal features — colors, progress bars, interactive prompts — that break in pipe mode. The -i flag starts an interactive shell session, and child.interact() passes I/O between the user’s terminal and the child process while capturing output via the filter callback.

run_cmd_subprocess() (lines 42-86) is the cross-platform fallback using subprocess.Popen():

def run_cmd_subprocess(cmd, verbose=False, error_print=None, encoding=None):
shell = os.environ.get("SHELL", "/bin/sh")
process = subprocess.Popen(
cmd,
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
encoding=encoding or sys.stdout.encoding,
errors="replace",
bufsize=0,
)
output = []
while True:
chunk = process.stdout.read(1)
if not chunk:
break
print(chunk, end="", flush=True)
output.append(chunk)

Key design decisions:

  • Single-character reads (read(1)): Provides real-time output streaming. Each character is immediately printed and buffered. This is slower than bulk reads but ensures the user sees output as it’s produced.
  • bufsize=0: Disables Python’s output buffering for immediate reads.
  • stderr=subprocess.STDOUT: Merges stderr into stdout so all output is captured in order.
  • errors="replace": Replaces malformed UTF-8 with replacement characters instead of crashing.
  • No timeout: Aider relies on user interruption (Ctrl-C) for runaway commands.

On Windows, Aider detects the parent process to determine the shell. It inspects psutil.Process().parent() and wraps commands in powershell.exe -Command "..." or cmd.exe /c "..." accordingly.

Aider exposes shell execution through several slash commands:

/run <cmd> (commands.py, lines 1007-1047): Executes arbitrary commands. After execution, prompts the user whether to add the output to the chat context. Token-counts the output before offering. If add_on_nonzero_exit=True (used by auto-test), non-zero exits automatically add output to chat.

/test [cmd] (commands.py, lines 987-1005): Runs the configured test command (--test-cmd) or a provided command. Non-zero exit triggers the LLM to fix the test failure.

/lint [file] (commands.py, lines 356-409): Runs the configured lint command (--lint-cmd) against specific files. Uses Linter.lint() from aider/linter.py which shells out and parses output for line numbers. The py_lint() fallback uses flake8 for Python files.

/git <args> (commands.py, lines 961-985): Raw git wrapper. Sets GIT_EDITOR=true to suppress editor pop-ups. Output is excluded from chat context.

Commands inherit the parent process environment. No isolation, no filtering. The shell is detected from $SHELL on Unix and from the parent process name on Windows.


Reference: references/codex/codex-rs/core/src/exec.rs, core/src/spawn.rs, utils/pty/src/pty.rs | Commit: 4ab44e2c5

Codex implements the most sophisticated command execution system of the three, with sandbox-aware routing, PTY support, configurable environment isolation, and strict output limits.

The execution flow passes through multiple layers:

ExecParams -> process_exec_tool_call() ->
SandboxManager.transform() ->
execute_exec_env() ->
spawn_child_async()

Each layer adds a concern: process_exec_tool_call() validates the command, SandboxManager.transform() applies sandbox policy (see platform-isolation page), execute_exec_env() builds the environment, and spawn_child_async() creates the process.

exec.rs (lines 40-120) defines timeout semantics:

const DEFAULT_EXEC_COMMAND_TIMEOUT_MS: u64 = 10_000; // 10 seconds
const EXEC_TIMEOUT_EXIT_CODE: i32 = 124; // Unix convention
pub enum ExecExpiration {
Timeout(Duration),
DefaultTimeout,
Cancellation(CancellationToken),
}

The 10-second default is aggressive — designed for quick tool calls (compile, test, lint), not long-running builds. Users can override per command. Timeout produces exit code 124, matching the Unix timeout command convention. The CancellationToken variant allows async cancellation from the session layer.

const READ_CHUNK_SIZE: usize = 8_192; // 8 KB per async read
const EXEC_OUTPUT_MAX_BYTES: usize = 1024 * 1024; // 1 MiB hard cap
const MAX_EXEC_OUTPUT_DELTAS_PER_CALL: usize = 10_000; // Event stream cap

Output is read in 8 KB chunks via async reads. If total output exceeds 1 MiB, it’s truncated. The event stream delta cap prevents flooding the TUI with too many incremental updates from a single command.

spawn.rs (lines 50-125) handles low-level process creation with two I/O policies:

enum StdioPolicy {
RedirectForShellTool, // Capture stdout/stderr for LLM
Inherit, // Pass through to parent terminal
}

Unix-specific setup (lines 86-104):

  • pre_exec() hook runs detach_from_tty() when redirecting I/O, preventing the child from receiving terminal signals meant for the parent.
  • Parent death signal (Linux only, lines 96-101): set_parent_death_signal(parent_pid) ensures the child receives SIGTERM if the parent process dies unexpectedly. This prevents orphaned processes from accumulating.
  • kill_on_drop(true) (line 124): Ensures the child is killed when the tokio::process::Child handle is dropped.

Environment security (lines 74-79):

cmd.env_clear();
cmd.envs(env);

env_clear() removes ALL inherited environment variables. The environment is rebuilt from scratch using the exec env system. When network is restricted by sandbox policy, CODEX_SANDBOX_NETWORK_DISABLED=1 is set.

exec_env.rs (lines 20-93) implements a six-step environment construction pipeline:

  1. Inherit strategy: All (inherit everything), None (start empty), or Core (only essential vars)
  2. Default excludes: Filter variables matching *KEY*, *SECRET*, *TOKEN* patterns
  3. Custom excludes: User-defined exclusion patterns from config
  4. Overrides: Explicit key-value pairs from config
  5. Include-only filter: Whitelist mode — if specified, only these variables survive
  6. Thread ID injection: Adds CODEX_THREAD_ID for session context

The core variables that survive the Core strategy:

const CORE_VARS: &[&str] = &[
"HOME", "LOGNAME", "PATH", "SHELL", "USER", "USERNAME", "TMPDIR", "TEMP", "TMP"
];

utils/pty/src/pty.rs (lines 58-174) provides async PTY spawning using the portable_pty crate:

pub async fn spawn_process(
program: &str,
args: &[String],
cwd: &Path,
env: &HashMap<String, String>,
arg0: &Option<String>,
) -> Result<SpawnedProcess> {
let pty_system = platform_native_pty_system();
let pair = pty_system.openpty(PtySize {
rows: 24, cols: 80,
pixel_width: 0, pixel_height: 0,
})?;
let mut command_builder = CommandBuilder::new(arg0.unwrap_or(&program.to_string()));
command_builder.cwd(cwd);
command_builder.env_clear();
for (key, value) in env {
command_builder.env(key, value);
}
let mut child = pair.slave.spawn_command(command_builder)?;

Output reading runs in a spawn_blocking task with 8 KB buffers, broadcasting via tokio::sync::broadcast to multiple subscribers (TUI display, output accumulator):

let reader_handle = tokio::task::spawn_blocking(move || {
let mut buf = [0u8; 8_192];
loop {
match reader.read(&mut buf) {
Ok(0) => break,
Ok(n) => { let _ = output_tx_clone.send(buf[..n].to_vec()); }
Err(ref e) if e.kind() == ErrorKind::WouldBlock => {
std::thread::sleep(Duration::from_millis(5));
continue;
}
Err(_) => break,
}
}
});

Input goes through an mpsc channel, and exit status is tracked via AtomicBool + StdMutex<Option<i32>>.

  • Unix: portable_pty::native_pty_system() for PTY allocation.
  • Windows: Custom ConPtySystem using Windows ConPTY API. The codex_windows_sandbox crate provides sandboxed execution with restricted tokens and ACLs, supporting Elevated and Legacy sandbox levels.

exec_policy.rs (lines 38-85) maintains a list of 85 dangerous command patterns that trigger approval requests:

  • Shell escapes: bash -lc, sh -c, zsh -lc
  • Interpreters: python3 -c, perl -e, ruby -e
  • Privilege escalation: sudo, env
  • System utilities: git, node

Reference: references/opencode/packages/opencode/src/tool/bash.ts | Commit: 7ed4499

OpenCode implements command execution as a tool with tree-sitter-based permission analysis.

The bash tool in tool/bash.ts (lines 55-269) accepts:

z.object({
command: z.string(),
timeout: z.number().optional(),
workdir: z.string().optional(),
description: z.string(),
})
const DEFAULT_TIMEOUT = Flag.OPENCODE_EXPERIMENTAL_BASH_DEFAULT_TIMEOUT_MS || 2 * 60 * 1000

2 minutes by default — significantly more generous than Codex’s 10 seconds. Configurable via environment flag. The timeout includes a 100ms grace period before force-killing.

Before execution, OpenCode parses the command with tree-sitter to extract permission-relevant information (lines 84-145). The parser extracts path arguments from dangerous commands (cd, rm, cp, mv, mkdir, touch, chmod, chown, cat) and resolves them via realpath for granular permission requests. This allows the permission system to ask “Allow write to /etc/hosts?” rather than “Allow bash command?”:

await ctx.ask({
permission: "external_directory",
patterns: globs,
})
await ctx.ask({
permission: "bash",
patterns: Array.from(patterns),
always: Array.from(always),
})

Windows Git Bash paths are normalized (Unix /c/Users to Windows C:\Users).

The tool uses Node.js child_process.spawn() (lines 167-176):

  • Shell: Detected via Shell.acceptable() utility
  • stdin: Set to "ignore" — prevents commands from hanging waiting for input
  • stdout/stderr: Piped for capture
  • detached: true on Unix: Creates a new process group for clean tree killing
  • Plugin environment: Plugin.trigger("shell.env", ...) hook allows plugins to inject environment variables

Two levels of output management (lines 178-201):

Full output: Accumulated without truncation, returned to the agent for LLM context.

Metadata output: Truncated to 30,000 characters (MAX_METADATA_LENGTH) for UI display. Updated in real time via ctx.metadata():

let output = ""
const append = (chunk: Buffer) => {
output += chunk.toString()
ctx.metadata({
metadata: {
output: output.length > MAX_METADATA_LENGTH
? output.slice(0, MAX_METADATA_LENGTH) + "\n\n..."
: output,
description: params.description,
},
})
}

Both stdout and stderr feed into the same append callback.

Shell.killTree(proc, { exited: () => exited }) kills the process and all children using the process group ID. Cancellation integrates with the abort signal:

ctx.abort.addEventListener("abort", () => {
aborted = true
void kill()
})

Cleanup waits for either the exit event or error event before resolving, ensuring all resources are freed:

await new Promise<void>((resolve, reject) => {
proc.once("exit", () => { exited = true; cleanup(); resolve() })
proc.once("error", (error) => { exited = true; cleanup(); reject(error) })
})

The tool returns full output for the LLM and truncated metadata for the UI:

return {
title: params.description,
metadata: { output: truncated, exit: proc.exitCode, description: params.description },
output, // Full, untruncated
}

When timeout or abort occurs, metadata annotations are appended in <bash_metadata> tags.


Claude Code is closed-source, so execution details are inferred from public documentation and observed behavior.

Claude Code exposes command execution across four deployment channels, each with different constraints:

ChannelExecution ContextPermission HandlingOutput Handling
Interactive CLIUser’s machine, full PTY5 permission modes, per-command approvalReal-time terminal output + LLM context
IDE (VS Code/JetBrains)User’s machine, piped I/O4 modes (no dontAsk in VS Code), IDE approval dialogsIDE terminal panel + LLM context
CI/CD (-p mode)Runner/container, piped I/OacceptEdits + --allowedTools pre-configuredCaptured for output format (text/json/stream-json)
Cloud/WebAnthropic VM, piped I/OacceptEdits (default for remote)Captured, accessible via web UI

The -p (print) flag transforms the CLI into a batch execution engine:

Terminal window
# Basic one-shot
claude -p "Run tests and fix failures" --allowedTools "Bash,Read,Edit"
# Structured output with JSON schema
claude -p "Extract function names from auth.py" \
--output-format json \
--json-schema '{"type":"object","properties":{"functions":{"type":"array","items":{"type":"string"}}}}'
# Streaming with token-level events
claude -p "Explain recursion" \
--output-format stream-json --verbose --include-partial-messages

Output formats:

  • text (default): Plain text response
  • json: Structured JSON with result, session_id, structured_output, and usage metadata
  • stream-json: Newline-delimited JSON, each line an event object. Filter type == "stream_event" with event.delta.type == "text_delta" for streaming text.

--allowedTools uses the permission rule syntax with glob patterns:

Terminal window
# Prefix matching with trailing space+asterisk
--allowedTools "Bash(git diff *),Bash(git log *),Bash(git status *),Bash(git commit *)"

The space before * is significant: Bash(git diff *) matches git diff HEAD but Bash(git diff*) would also match git diff-index. This is the same word-boundary-aware pattern syntax documented in the permission system (Run 2).

User-invoked skills (/commit, /review) and built-in interactive commands are NOT available in -p mode. Tasks must be described in natural language.

The anthropics/claude-code-action@v1 action wraps claude -p with GitHub context injection:

- uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
prompt: "/review"
claude_args: "--max-turns 5 --model claude-sonnet-4-6"

Key properties:

  • Auto-detects interactive (responds to @claude mentions) vs automation (runs with prompt) mode
  • Claude GitHub App provides repository access (Contents, Issues, Pull requests read/write)
  • Supports AWS Bedrock (OIDC) and Google Vertex AI (Workload Identity Federation)
  • claude_args passes any CLI flag through (model, max-turns, allowed-tools, mcp-config)

Raw CLI invocation in .gitlab-ci.yml jobs:

script:
- /bin/gitlab-mcp-server || true
- claude -p "${AI_FLOW_INPUT:-'Review this MR'}"
--permission-mode acceptEdits
--allowedTools "Bash Read Edit Write mcp__gitlab"

Key differences from GitHub:

  • No dedicated action wrapper — raw CLI installation and invocation
  • GitLab MCP server (/bin/gitlab-mcp-server) provides GitLab API tools
  • Trigger variables (AI_FLOW_INPUT, AI_FLOW_CONTEXT, AI_FLOW_EVENT) pass context from webhooks
  • Beta status, maintained by GitLab

Cloud sessions (web and Desktop remote) run on Anthropic-managed VMs with a universal dev image:

Pre-installed runtimes: Python 3.x, Node.js LTS, Ruby 3.1/3.2/3.3, PHP 8.4, Java (OpenJDK), Go, Rust, C++ (GCC + Clang)

Pre-installed services: PostgreSQL 16, Redis 7.0

Network constraints:

  • All git operations route through a git proxy with scoped credentials (push restricted to current branch)
  • All HTTP/HTTPS outbound through a security proxy (abuse prevention, rate limiting)
  • Three access levels: No internet, Limited (domain allowlist covering npm/PyPI/crates.io/etc.), Full
  • Bun does not work correctly with the security proxy

Dependency management uses SessionStart hooks:

#!/bin/bash
# Only run in remote environments
if [ "$CLAUDE_CODE_REMOTE" != "true" ]; then exit 0; fi
npm install
pip install -r requirements.txt
exit 0

Hooks fire on every session start/resume. CLAUDE_ENV_FILE env var points to a file for persisting environment variables for subsequent Bash commands.

Two flags control system prompt composition in -p mode:

  • --append-system-prompt "...": Add instructions while keeping default behavior
  • --system-prompt "...": Fully replace the default prompt

Common pattern for CI/CD: pipe context through stdin and customize the role:

Terminal window
gh pr diff "$1" | claude -p \
--append-system-prompt "You are a security engineer. Review for vulnerabilities." \
--output-format json

Single-char reads are slow: Aider’s read(1) approach provides instant output visibility but is orders of magnitude slower than bulk reads for large outputs. Codex’s 8 KB chunks with broadcast channels are a better balance.

Process group cleanup is essential: Without process group management, a make -j8 command killed by timeout leaves 8 orphaned compiler processes. Both Codex (kill_on_drop(true) + process groups) and OpenCode (Shell.killTree() + detached: true) handle this. Aider relies on pexpect’s internal handling.

10-second timeouts break real workflows: Codex’s aggressive 10-second default causes frustration for legitimate long-running commands (builds, large test suites). OpenCode’s 2-minute default is more practical. The ideal approach is adaptive — short defaults with per-command overrides.

Environment clearing breaks commands: Codex’s env_clear() prevents API key leakage but can break commands that depend on inherited environment variables (locale settings, custom tool configurations, SSH agent sockets). The six-step pipeline with CORE_VARS mitigates this — clear everything, then explicitly add back essentials.

PTY size matters: Codex allocates 80x24 PTYs regardless of the user’s actual terminal size. Commands that detect terminal width (like ls with columns, git log --graph) produce different output than expected. The PTY size should match the user’s terminal or use a generous default (200 columns).

Windows is different in every way: Shell detection (PowerShell vs CMD vs Git Bash), process termination (no SIGTERM, use TerminateProcess()), PTY support (ConPTY is newer and less battle-tested), and environment variable handling (case-insensitive) all require platform-specific code paths.

Non-interactive mode hides interactive features: Claude Code’s -p mode disables skills (/commit), built-in commands, and interactive approval flows. This is by design but trips up CI/CD pipelines that try to invoke skills. The workaround is describing the task in natural language instead of using slash commands.

CI/CD permission surface is narrower than it appears: --allowedTools in CI/CD grants specific tool patterns but the word-boundary semantics of the glob syntax can be surprising. Bash(git diff *) allows git diff HEAD but NOT git diff-index (note the space before *). Misconfigured patterns cause permission denials that are hard to debug in ephemeral runners.

Cloud execution proxy breaks some tools: The security proxy in cloud/web sessions intercepts all HTTP/HTTPS traffic. Tools that use non-standard HTTP patterns, custom certificate pinning, or specific proxy-incompatible behaviors (Bun is the documented example) fail silently or with cryptic errors. The limited domain allowlist also blocks tools that need to reach non-allowlisted registries.

SessionStart hooks fire everywhere: Cloud dependency management hooks (SessionStart) run on ALL session starts, not just remote sessions. There’s no hook-level scope to restrict to remote-only execution. Scripts must check CLAUDE_CODE_REMOTE env var manually.


Core struct:

pub struct CommandExecutor {
sandbox: Arc<dyn SandboxPolicy>,
env_builder: ExecEnvBuilder,
default_timeout: Duration,
max_output_bytes: usize,
}
pub struct ExecResult {
pub exit_code: i32,
pub stdout: String,
pub stderr: String,
pub timed_out: bool,
pub truncated: bool,
}

Dual-mode execution: Use portable-pty for interactive commands (when the TUI is in alternate screen mode), tokio::process::Command with piped I/O for tool calls. Selection based on whether the command was user-initiated (/run) or LLM-initiated (tool call).

Output streaming: Use tokio::sync::broadcast for multi-subscriber output distribution (TUI display + LLM context accumulation). Read in 8 KB chunks. Hard cap at 1 MiB with truncation indicator.

Timeout: Default 120 seconds (matching OpenCode’s practical default). Per-command overrides via tool parameters. Use tokio::time::timeout wrapping the process future. On timeout, send SIGTERM, wait 5 seconds, then SIGKILL.

Process cleanup: Set kill_on_drop(true) on tokio::process::Child. Create process groups with pre_exec() on Unix. On Linux, set PR_SET_PDEATHSIG via prctl to SIGTERM the child if the parent dies.

Environment builder:

pub struct ExecEnvBuilder {
inherit_strategy: InheritStrategy,
exclude_patterns: Vec<Glob>,
overrides: HashMap<String, String>,
core_vars: &'static [&'static str],
}
pub enum InheritStrategy {
All,
Core, // Only PATH, HOME, SHELL, USER, TMPDIR
None,
}

Default to Core for LLM-initiated commands, All for user-initiated commands. Always filter *KEY*, *SECRET*, *TOKEN* patterns from inherited vars.

Shell detection: Check $SHELL on Unix, detect parent process on Windows. Fall back to /bin/sh (Unix) or cmd.exe (Windows). Store detected shell for session lifetime.

Permission integration: Parse commands with tree-sitter-bash to extract paths and program names. Route through the permission system before execution. Cache approvals per command pattern for the session.

Non-interactive mode (-p flag): OpenOxide CLI should support a -p / --print flag that disables the TUI and runs a single prompt to completion. Output formats: text (default), json (with result, session_id, usage metadata), stream-json (NDJSON events). Add --json-schema for structured output validation. Session continuation via --continue / --resume <session_id>.

pub enum OutputFormat {
Text,
Json { schema: Option<JsonSchema> },
StreamJson { verbose: bool, include_partial: bool },
}
pub struct NonInteractiveResult {
pub result: String,
pub session_id: SessionId,
pub structured_output: Option<serde_json::Value>, // When json-schema used
pub usage: UsageMetadata,
}

CI-friendly execution: Default to --permission-mode acceptEdits in non-interactive mode. Support --max-turns for cost control. Exit codes: 0 (success), 1 (error), 124 (timeout, matching Unix convention). Support --mcp-config for loading MCP servers in CI contexts.

System prompt composition: Support --append-system-prompt (add to default) and --system-prompt (replace default). Enable piping context via stdin: cat diff.patch | openoxide -p "Review this".

Cloud-deployable execution: Design CommandExecutor to work in container environments where PTY may not be available. Fall back to piped I/O gracefully. Support OPENOXIDE_REMOTE env var for scripts that need to detect remote execution context.