Skip to content

Agent Swarm

An agent swarm is a set of agents running concurrently, each with its own context window and tool execution environment, coordinated by an orchestrator that spawns them, monitors their progress, and aggregates their results. This is distinct from subagent spawning (covered in agents/subagents.md), which focuses on the mechanics of spawning a single child agent. The swarm problem is about running many agents in parallel, preventing them from conflicting with each other, and efficiently collecting results.

The core challenge is coordination without centralized state. Each agent in a swarm reads and writes files independently. Without explicit synchronization, two agents editing the same file produce a conflict that neither knows how to resolve. Without a wait mechanism, the orchestrator either polls (burning CPU) or blocks (losing parallelism). Without a depth limit, swarms recurse infinitely as agents spawn their own subagents.


Aider reference: commit b9050e1d5faf8096eae7a46a9ecc05a86231384b

Aider has no swarm capability. The only multi-agent pattern in Aider is the sequential Architect→Editor pipeline described in agents/subagents.md. Architect produces a plan, the user confirms, Editor applies the plan. These run in the same thread, one after the other.

There is no spawn_agent tool, no parallel execution, and no mechanism for the orchestrator to wait on multiple concurrent agents. The Python runtime is used single-threaded for all LLM interactions.

This is a deliberate simplicity choice. Sequential execution prevents file conflicts by design and eliminates an entire class of coordination bugs. Aider’s focus is on single-user, single-session, single-thread reliability.


Codex reference: commit 4ab44e2c5cc54ed47e47a6729dfd8aa5a3dc2476

Codex has the most complete multi-agent implementation of the three reference tools. A full swarm toolkit is exposed as LLM tools: spawn_agent, send_input, resume_agent, wait, and close_agent. These are implemented in codex-rs/core/src/tools/handlers/multi_agents.rs.

spawn_agent — launch a new child agent, returns its ThreadId
send_input — send a follow-up message to a running agent
resume_agent — resume a previously checkpointed agent from its rollout
wait — block until one or more agents reach a terminal state
close_agent — terminate an agent

The orchestrator (typically a parent agent) uses these tools to build whatever coordination pattern it needs: fan-out, pipeline, race, or gather.

File: codex-rs/core/src/agent/control.rs:40

pub(crate) async fn spawn_agent(
&self,
config: AgentConfig,
initial_message: String,
session_source: SessionSource,
) -> Result<ThreadId, CodexErr> {
let reservation = self.state.reserve_spawn_slot(config.agent_max_threads)?;
let thread_id = self.manager.upgrade()
.ok_or(CodexErr::ThreadManagerGone)?
.spawn_thread(config, initial_message, session_source, reservation)
.await?;
Ok(thread_id)
}

The call returns a ThreadId immediately — the agent starts running in the background in a tokio task. The caller can immediately spawn more agents or do other work.

File: codex-rs/core/src/agent/guards.rs

pub(crate) struct Guards {
threads_set: Mutex<HashSet<ThreadId>>,
total_count: AtomicUsize,
}

Before a spawn is committed, reserve_spawn_slot() runs an atomic compare-exchange:

pub(crate) fn reserve_spawn_slot(
self: &Arc<Self>,
max_threads: Option<usize>,
) -> Result<SpawnReservation> {
if let Some(max_threads) = max_threads {
if !self.try_increment_spawned(max_threads) {
return Err(CodexErr::AgentLimitReached { max_threads });
}
}
Ok(SpawnReservation { state: Arc::clone(self), active: true })
}
fn try_increment_spawned(&self, limit: usize) -> bool {
let current = self.total_count.load(Ordering::Relaxed);
if current >= limit {
return false;
}
self.total_count.compare_exchange_weak(
current,
current + 1,
Ordering::AcqRel,
Ordering::Relaxed,
).is_ok()
}

SpawnReservation is a RAII guard — if the spawn fails after the slot was reserved, dropping the reservation decrements the counter. The default agent max threads is defined in AgentControl config as DEFAULT_AGENT_MAX_THREADS = Some(6).

The Guards instance is shared across all AgentControl clones within the same session. This means the limit is session-scoped: all agents spawned by a parent session share the same concurrency budget.

File: codex-rs/core/src/agent/guards.rs:24

pub(crate) const MAX_THREAD_SPAWN_DEPTH: i32 = 1;
pub(crate) fn exceeds_thread_spawn_depth_limit(depth: i32) -> bool {
depth > MAX_THREAD_SPAWN_DEPTH
}

MAX_THREAD_SPAWN_DEPTH = 1 means the hierarchy is at most two levels deep: the root session and its direct children. Grandchildren cannot spawn more agents. The depth is tracked in SessionSource::SubAgent(SubAgentSource::ThreadSpawn { depth, .. }) and incremented at each spawn level.

This prevents runaway recursion. A depth-one limit is a strong constraint — it means swarms are flat, not hierarchical. If an agent in the swarm needs to do its own sub-delegation, it cannot. This trades flexibility for predictability.

File: codex-rs/core/src/tools/handlers/multi_agents.rs

The wait tool is the key coordination primitive for swarms. It takes a list of thread IDs and blocks until one of them reaches a terminal state:

#[derive(Debug, Deserialize)]
struct WaitArgs {
ids: Vec<String>, // Thread IDs to wait on
timeout_ms: Option<i64>,
}

Timeout constraints are enforced server-side:

pub(crate) const MIN_WAIT_TIMEOUT_MS: i64 = 10_000; // 10 seconds
pub(crate) const DEFAULT_WAIT_TIMEOUT_MS: i64 = 30_000; // 30 seconds
pub(crate) const MAX_WAIT_TIMEOUT_MS: i64 = 300_000; // 5 minutes

The minimum of 10 seconds is intentional. The comment in the source notes: “Very short timeouts encourage busy-polling loops in the orchestrator prompt and can cause high CPU usage.”

The wait implementation uses FuturesUnordered to subscribe to status watch channels for all requested agents in parallel:

// Subscribe to status for each agent
let mut status_rxs = Vec::with_capacity(receiver_thread_ids.len());
for id in &receiver_thread_ids {
match session.services.agent_control.subscribe_status(*id).await {
Ok(rx) => {
let status = rx.borrow().clone();
if is_final(&status) {
initial_final_statuses.push((*id, status));
}
status_rxs.push((*id, rx));
}
Err(_) => { /* agent not found, treat as done */ }
}
}
// Race across all status watchers
let mut futures = FuturesUnordered::new();
for (id, rx) in status_rxs.into_iter() {
futures.push(wait_for_final_status(session.clone(), id, rx));
}
let deadline = Instant::now() + Duration::from_millis(timeout_ms as u64);
loop {
match timeout_at(deadline, futures.next()).await {
Ok(Some(Some(result))) => {
results.push(result);
break; // Returns on FIRST completed agent
}
Ok(Some(None)) => continue,
Ok(None) | Err(_) => break,
}
}

Each wait_for_final_status() future subscribes to the tokio watch::Receiver<AgentStatus> for its agent. When the status transitions to a terminal state (Completed, Errored, Shutdown), the future resolves. The parent agent receives the first completion and can inspect the result.

This is an OR-wait, not an AND-wait: the wait tool returns when the first agent finishes, not when all finish. To wait for all agents, the orchestrator calls wait in a loop, processing one result at a time.

File: codex-rs/core/src/agent/status.rs

Status transitions are driven by the event stream:

pub(crate) fn agent_status_from_event(msg: &EventMsg) -> Option<AgentStatus> {
match msg {
EventMsg::TurnStarted(_) => Some(AgentStatus::Running),
EventMsg::TurnComplete(e) => Some(AgentStatus::Completed(e.last_agent_message.clone())),
EventMsg::TurnAborted(e) => Some(AgentStatus::Errored(format!("{:?}", e.reason))),
EventMsg::Error(e) => Some(AgentStatus::Errored(e.message.clone())),
EventMsg::ShutdownComplete => Some(AgentStatus::Shutdown),
_ => None,
}
}

AgentStatus values: PendingInit, Running, Completed(String), Errored(String), Shutdown, NotFound.

The Completed variant carries the last assistant message from the agent’s turn. This is the agent’s “return value” — the orchestrator reads it via the wait result to understand what the child agent accomplished.

When an agent spawns children, Codex emits dedicated events for observability:

CollabAgentSpawnBeginEvent { thread_id, task_description }
CollabAgentSpawnEndEvent { thread_id, initial_status }
CollabWaitingBeginEvent { ids, timeout_ms }

These appear in the TUI as nested status indicators under the parent turn, giving the user visibility into which child agents are running, completed, or errored.

Codex does not implement file-level locking or conflict detection for concurrent agents. Each agent runs in its own session context and makes independent file modifications. If two agents edit the same file simultaneously, the last writer wins.

The practical mitigation is prompt design: the orchestrator is expected to partition work such that agents do not overlap on files. The MAX_THREAD_SPAWN_DEPTH = 1 limit reduces the risk by constraining the hierarchy to one level where coordination is more tractable.


OpenCode reference: commit 7ed449974864361bad2c1f1405769fd2c2fcdf42

OpenCode’s subagent mechanism (the task tool in packages/opencode/src/tool/task.ts) is sequential per invocation — each task tool call creates one child session and awaits its completion before the parent turn continues. There is no native parallel spawn-and-wait mechanism.

The general built-in agent has this description (packages/opencode/src/agent/agent.ts:112):

"Use this agent to execute multiple units of work in parallel."

This is a prompt-level affordance, not an implementation feature. The description instructs the LLM to invoke multiple task tool calls, and the Vercel AI SDK will dispatch concurrent tool calls if the LLM emits them in a single response. Whether the LLM actually does this depends on the model and the turn.

The plan agent also references parallel execution in its system prompt, describing a Phase 1 that launches “up to 3 explore agents IN PARALLEL.” Again, this is a prompting strategy: the LLM is expected to emit 3 task invocations in one response.

When the LLM emits multiple tool calls in a single response turn, OpenCode’s session processor (packages/opencode/src/session/prompt.ts) collects them and processes them. The Vercel AI SDK’s fullStream handles concurrent tool dispatch at the API level. In practice, because task tool execution involves creating a child session, spawning its agent loop, and awaiting its session.completed bus event, multiple concurrent task calls would run as concurrent async operations within the event loop.

The limitation is that there is no explicit orchestration layer: no wait tool, no status polling API, and no per-child cancellation. The parent either awaits all children or none.

Child sessions created by the task tool have their own SQLite session record with parentID set to the invoking session’s ID. Each child has its own message history and runs its own agent loop independently. File operations go through the same apply_patch / write / multiedit tools, but each child session has its own permission grant history.

No file-level coordination exists. Concurrent child sessions can overwrite each other’s file changes. This is expected to be managed by the orchestrating LLM through work partitioning.


Claude Code is closed-source, so architecture is inferred from public documentation. Claude Code has two distinct multi-agent mechanisms: subagents (documented in Run 3, see Subagents) and agent teams (documented here). Subagents are spawned within a session and report back. Agent teams are fully independent sessions with shared coordination.

Agent teams are experimental (disabled by default, CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1). They represent a fundamentally different coordination model from Codex’s spawn/wait primitives or OpenCode’s prompt-level parallelism.

Components:

ComponentRoleStorage
Team leadMain Claude Code session. Creates team, spawns teammates, coordinates work, synthesizes results.Session transcript
TeammatesSeparate Claude Code instances with independent context windows. Each loads project context (CLAUDE.md, MCP, skills) + spawn prompt. Lead’s conversation history does NOT carry over.Session transcript
Task listShared work items with three states (pending, in_progress, completed) and dependency tracking. Tasks auto-unblock when dependencies complete.~/.claude/tasks/{team-name}/
MailboxDirect messaging system. message (one-to-one) and broadcast (one-to-all). Automatic delivery, no polling.In-memory

Team configuration stored at ~/.claude/teams/{team-name}/config.json with a members array (name, agent ID, agent type). Teammates can read this to discover other team members.

Agent teams coordinate via a shared task list pattern, fundamentally different from Codex’s spawn/wait/status pattern:

MechanismCodex SwarmClaude Code Agent Teams
Work distributionOrchestrator spawns agents with specific tasksLead creates task list, teammates self-claim or get assigned
Coordination primitivewait tool (blocks until agent completes)Task list + direct messaging
Result collectionwait returns agent’s last messageTeammates report via messages, lead synthesizes
Inter-agent communicationNone (agents only communicate with parent)Direct teammate-to-teammate messaging
Depth1-level (hardcoded)Flat (no nested teams, no teammate sub-teams)
Concurrency controlAtomic Guards counter, DEFAULT_AGENT_MAX_THREADS = 6Team size set at creation, no global cap documented

Tasks have three states: pending, in_progress, completed. Tasks can depend on other tasks: a pending task with unresolved dependencies cannot be claimed until those dependencies complete. Task claiming uses file locking to prevent race conditions when multiple teammates try to claim simultaneously.

Work assignment follows two patterns:

  • Lead assigns: Lead explicitly assigns tasks to specific teammates
  • Self-claim: After finishing a task, teammates pick up the next unassigned, unblocked task

Two specialized hooks enforce quality during team execution:

HookTriggerExit Code 2 Effect
TeammateIdleTeammate is about to go idleSend feedback, keep teammate working
TaskCompletedTask is being marked completePrevent completion, send feedback

These hooks enable automated quality checks: CI validation before task completion, code review requirements, test coverage thresholds.

Teammates can be required to plan before implementing:

  1. Teammate enters read-only plan mode
  2. Teammate completes plan, sends approval request to lead
  3. Lead reviews and approves or rejects with feedback
  4. If rejected: teammate revises and resubmits
  5. If approved: teammate exits plan mode, begins implementation

The lead makes approval decisions autonomously. Influence via prompt: “only approve plans that include test coverage.”

ModeMechanismTerminal Requirement
In-process (default)All teammates in main terminal. Shift+Down cycles.Any terminal
Split panesEach teammate in own pane. Click to interact.tmux or iTerm2
AutoSplit panes if in tmux session, in-process otherwise

Configure via "teammateMode" in settings.json or --teammate-mode CLI flag.

  • All teammates start with the lead’s permission settings
  • If lead uses --dangerously-skip-permissions, all teammates inherit it
  • Individual modes can be changed after spawning
  • Per-teammate modes cannot be set at spawn time
  • No session resumption: /resume and /rewind do not restore in-process teammates
  • Task status lag: Teammates sometimes fail to mark tasks as completed
  • Slow shutdown: Teammates finish current request/tool call before stopping
  • One team per session: Must clean up before starting new team
  • No nested teams: Teammates cannot spawn their own teams
  • Fixed lead: Cannot promote a teammate or transfer leadership
  • Permissions fixed at spawn: Start with lead’s mode, adjustable after but not before
  • Split pane restrictions: Not supported in VS Code terminal, Windows Terminal, Ghostty

OpenCode’s strategy of telling the LLM to “run agents in parallel” via system prompt is unreliable. Models that have not been trained on multi-agent orchestration patterns tend to serialize their task invocations even when instructed to run in parallel. The only reliable parallelism guarantee comes from tool-level concurrency enforcement, as Codex does.

Depth Limits Must Be Enforced in Code, Not Prompt

Section titled “Depth Limits Must Be Enforced in Code, Not Prompt”

Codex’s MAX_THREAD_SPAWN_DEPTH = 1 is enforced at the spawn_agent handler level. If it were only a prompt instruction (“do not spawn grandchildren”), models would violate it under adversarial or confused conditions. Hard enforcement in the spawn handler prevents infinite recursion regardless of what the LLM requests.

Codex’s wait tool returns on the first completion. To gather results from N agents, the orchestrator must call wait N times in a loop. This is more token-efficient than waiting for all agents (the orchestrator can react early if one agent fails) but it requires the LLM to maintain a list of pending thread IDs across multiple turns. Models that lose track of this list leave ghost agents running until timeout.

Session-Level Concurrency Budget, Not Global

Section titled “Session-Level Concurrency Budget, Not Global”

Codex’s Guards counter is scoped to a session, not to the process. If a user has multiple top-level sessions open simultaneously, each gets its own DEFAULT_AGENT_MAX_THREADS = 6 budget. System-wide agent count can exceed 6. There is no global cap.

Result Aggregation Requires Context Window Awareness

Section titled “Result Aggregation Requires Context Window Awareness”

The orchestrator must receive and integrate results from potentially many child agents. Each Completed status carries only last_agent_message — the final assistant message from the child’s last turn. If the child’s work product is large (a generated file, a detailed analysis), it must be written to disk and the orchestrator must read it. Passing large results through the status message string is not supported and would hit context limits anyway.

Cancellation Does Not Roll Back File Changes

Section titled “Cancellation Does Not Roll Back File Changes”

If an agent in a swarm is cancelled (via close_agent or CancellationToken), its file modifications are not reverted. The ghost commit system (git/ghost-commits.md) provides per-turn snapshots, but partial-turn cancellation leaves the filesystem in whatever state the agent reached before cancellation. Rollback requires manual intervention.

None of the three reference implementations lock files before writing. The assumption is that the orchestrator partitions work correctly. In practice, agents frequently race on shared files (package.json, configuration files, shared utilities) because the LLM’s work partitioning is imperfect. The solution is to either use workspace-level subagents (each in a git worktree) or to accept that file conflicts will occasionally require human resolution.

Shared Task List Coordination Has Overhead

Section titled “Shared Task List Coordination Has Overhead”

Claude Code’s agent team task list with file-locking prevents race conditions but adds filesystem I/O overhead per claim. In Codex’s model, the orchestrator directly assigns work via spawn_agent — no shared state to contend over. For small swarms (2-4 agents), direct assignment is simpler. Shared task lists become valuable at larger scales where the orchestrator cannot efficiently track all assignments.

Direct Messaging Creates Token Amplification

Section titled “Direct Messaging Creates Token Amplification”

Agent teams allow teammates to message each other directly. Each message consumes tokens in both the sender’s and receiver’s context windows. A team of N agents with frequent broadcast messages faces O(N) token amplification per broadcast. Codex’s model, where agents only communicate with the parent, keeps communication costs linear.

The TaskCompleted hook fires when a task is being marked complete, not during execution. If a teammate spends significant tokens on an approach that fails the quality gate, those tokens are wasted. Pre-execution plan approval (Claude Code’s plan approval flow) is more token-efficient for risky tasks but adds latency.

Unlike Codex’s swarm (tokio tasks within a single process), Claude Code’s agent teams are full Claude Code session processes. Each teammate is a separate Claude instance with its own context window. This means: separate API connections, separate rate limit consumption, separate compaction timelines. Token cost scales linearly with team size, not just compute cost.


Layer 1 is the spawn layer — async agent creation with atomic concurrency guards, identical to Codex’s Guards design. Layer 2 is the coordination layer — explicit wait primitive with configurable OR/AND semantics, timeout enforcement, and status broadcasting.

pub enum SwarmTool {
SpawnAgent {
task: String,
model: Option<String>,
max_tokens: Option<usize>,
},
WaitAgent {
ids: Vec<AgentId>,
mode: WaitMode, // Any (first completion) | All (all complete)
timeout_ms: u64,
},
SendInput {
id: AgentId,
message: String,
},
CloseAgent {
id: AgentId,
},
}
pub enum WaitMode {
Any, // Return when first agent completes (Codex behavior)
All, // Return when all agents complete
}

WaitMode::All is an addition over Codex’s OR-only semantics. It simplifies the common “fan-out, gather-all” pattern by eliminating the LLM’s need to maintain a pending ID list.

pub struct AgentGuard {
active: Arc<AtomicUsize>,
max: usize,
}
impl AgentGuard {
pub fn try_acquire(&self) -> Option<AgentSlot> {
let current = self.active.load(Ordering::Relaxed);
if current >= self.max {
return None;
}
match self.active.compare_exchange_weak(
current, current + 1,
Ordering::AcqRel, Ordering::Relaxed
) {
Ok(_) => Some(AgentSlot { guard: Arc::clone(&self.active) }),
Err(_) => None, // Retry at call site if needed
}
}
}
pub struct AgentSlot {
guard: Arc<AtomicUsize>,
}
impl Drop for AgentSlot {
fn drop(&mut self) {
self.guard.fetch_sub(1, Ordering::AcqRel);
}
}

Codex’s MAX_THREAD_SPAWN_DEPTH = 1 is the right call for an initial implementation. Start with depth 1, expose it as a config field, and increase it only after swarm stability is proven:

pub struct SwarmConfig {
pub max_concurrent_agents: usize, // default 6
pub max_spawn_depth: u32, // default 1
pub wait_timeout_min_ms: u64, // default 10_000
pub wait_timeout_max_ms: u64, // default 300_000
}

Use tokio watch channels as Codex does. Each spawned agent has a watch::Sender<AgentStatus>. The parent subscribes via watch::Receiver through the agent registry. WaitMode::All collects receivers for all requested IDs and uses FuturesUnordered to drive all of them concurrently:

pub async fn wait_all(
receivers: Vec<(AgentId, watch::Receiver<AgentStatus>)>,
deadline: Instant,
) -> Vec<(AgentId, AgentStatus)> {
let mut futs = FuturesUnordered::new();
for (id, rx) in receivers {
futs.push(wait_for_terminal(id, rx));
}
let mut results = vec![];
loop {
match timeout_at(deadline, futs.next()).await {
Ok(Some(r)) => results.push(r),
Ok(None) | Err(_) => break,
}
}
results
}
[dependencies]
tokio = { version = "1", features = ["full"] }
uuid = { version = "1", features = ["v4"] }

No external crates beyond what the agent loop already uses.

  • Flat depth, not hierarchical: Start with max_spawn_depth = 1. Hierarchical swarms add coordination complexity that is not well-studied in practice.
  • AND-wait in addition to OR-wait: The gather-all pattern is common enough to be a first-class operation, not a loop the LLM must implement.
  • Timeout floor at 10 seconds: Prevents models from polling at sub-second intervals, which wastes tokens and causes thrashing.
  • Result via last message: The agent’s final assistant message is the result. Large work products go to disk; the message reports their location.
  • No file locking: Trust work partitioning. Add file-locking only if real-world usage shows frequent conflicts.

Claude Code’s agent teams represent a higher-level orchestration model built on top of the subagent/swarm primitives. If OpenOxide adds team support, it would be a separate layer:

pub struct AgentTeam {
lead: SessionId,
members: Vec<TeamMember>,
task_store: Arc<TaskStore>,
mailbox: Arc<Mailbox>,
config_path: PathBuf, // ~/.openoxide/teams/{name}/config.json
}
pub struct TeamMember {
name: String,
agent_id: AgentId,
session: SessionHandle, // Handle to independent openoxide process/session
}
pub struct TaskStore {
tasks: RwLock<Vec<TeamTask>>,
lock_dir: PathBuf, // File-based locking for concurrent claims
}
pub struct TeamTask {
id: TaskId,
status: TaskStatus, // Pending, InProgress(AgentId), Completed
depends_on: Vec<TaskId>,
assigned_to: Option<AgentId>,
}
pub struct Mailbox {
channels: HashMap<AgentId, mpsc::Sender<TeamMessage>>,
}
pub enum TeamMessage {
Direct { from: AgentId, content: String },
Broadcast { from: AgentId, content: String },
Shutdown,
PlanApproval { plan: String },
PlanDecision { approved: bool, feedback: Option<String> },
}

Key design decisions for teams layer:

  • Teams are process-level, swarms are task-level. The existing swarm (SpawnAgent/WaitAgent) operates within a single session as tokio tasks. Teams would spawn separate openoxide processes, each with its own context window.
  • Shared task store with file locking. Follow Claude Code’s approach. Use flock() on Unix for atomic task claiming. Store tasks in ~/.openoxide/tasks/{team-name}/.
  • Quality gate hooks. Add TeammateIdle and TaskCompleted as new hook events alongside the existing 13 OpenOxide events.
  • Plan approval flow. Reuse the existing plan mode infrastructure. Add a message type for plan submission/approval between team members.
  • Display modes deferred. In-process display (cycling between teammates in one terminal) requires TUI refactoring. Start with split-pane mode only (tmux integration or separate terminals).