Agent Swarm
Feature Definition
Section titled “Feature Definition”An agent swarm is a set of agents running concurrently, each with its own context window and tool execution environment, coordinated by an orchestrator that spawns them, monitors their progress, and aggregates their results. This is distinct from subagent spawning (covered in agents/subagents.md), which focuses on the mechanics of spawning a single child agent. The swarm problem is about running many agents in parallel, preventing them from conflicting with each other, and efficiently collecting results.
The core challenge is coordination without centralized state. Each agent in a swarm reads and writes files independently. Without explicit synchronization, two agents editing the same file produce a conflict that neither knows how to resolve. Without a wait mechanism, the orchestrator either polls (burning CPU) or blocks (losing parallelism). Without a depth limit, swarms recurse infinitely as agents spawn their own subagents.
Aider Implementation
Section titled “Aider Implementation”Aider reference: commit b9050e1d5faf8096eae7a46a9ecc05a86231384b
Aider has no swarm capability. The only multi-agent pattern in Aider is the sequential Architect→Editor pipeline described in agents/subagents.md. Architect produces a plan, the user confirms, Editor applies the plan. These run in the same thread, one after the other.
There is no spawn_agent tool, no parallel execution, and no mechanism for the orchestrator to wait on multiple concurrent agents. The Python runtime is used single-threaded for all LLM interactions.
This is a deliberate simplicity choice. Sequential execution prevents file conflicts by design and eliminates an entire class of coordination bugs. Aider’s focus is on single-user, single-session, single-thread reliability.
Codex Implementation
Section titled “Codex Implementation”Codex reference: commit 4ab44e2c5cc54ed47e47a6729dfd8aa5a3dc2476
Codex has the most complete multi-agent implementation of the three reference tools. A full swarm toolkit is exposed as LLM tools: spawn_agent, send_input, resume_agent, wait, and close_agent. These are implemented in codex-rs/core/src/tools/handlers/multi_agents.rs.
Tool Surface
Section titled “Tool Surface”spawn_agent — launch a new child agent, returns its ThreadIdsend_input — send a follow-up message to a running agentresume_agent — resume a previously checkpointed agent from its rolloutwait — block until one or more agents reach a terminal stateclose_agent — terminate an agentThe orchestrator (typically a parent agent) uses these tools to build whatever coordination pattern it needs: fan-out, pipeline, race, or gather.
Spawn: AgentControl::spawn_agent()
Section titled “Spawn: AgentControl::spawn_agent()”File: codex-rs/core/src/agent/control.rs:40
pub(crate) async fn spawn_agent( &self, config: AgentConfig, initial_message: String, session_source: SessionSource,) -> Result<ThreadId, CodexErr> { let reservation = self.state.reserve_spawn_slot(config.agent_max_threads)?; let thread_id = self.manager.upgrade() .ok_or(CodexErr::ThreadManagerGone)? .spawn_thread(config, initial_message, session_source, reservation) .await?; Ok(thread_id)}The call returns a ThreadId immediately — the agent starts running in the background in a tokio task. The caller can immediately spawn more agents or do other work.
Concurrency Limiting: Guards
Section titled “Concurrency Limiting: Guards”File: codex-rs/core/src/agent/guards.rs
pub(crate) struct Guards { threads_set: Mutex<HashSet<ThreadId>>, total_count: AtomicUsize,}Before a spawn is committed, reserve_spawn_slot() runs an atomic compare-exchange:
pub(crate) fn reserve_spawn_slot( self: &Arc<Self>, max_threads: Option<usize>,) -> Result<SpawnReservation> { if let Some(max_threads) = max_threads { if !self.try_increment_spawned(max_threads) { return Err(CodexErr::AgentLimitReached { max_threads }); } } Ok(SpawnReservation { state: Arc::clone(self), active: true })}
fn try_increment_spawned(&self, limit: usize) -> bool { let current = self.total_count.load(Ordering::Relaxed); if current >= limit { return false; } self.total_count.compare_exchange_weak( current, current + 1, Ordering::AcqRel, Ordering::Relaxed, ).is_ok()}SpawnReservation is a RAII guard — if the spawn fails after the slot was reserved, dropping the reservation decrements the counter. The default agent max threads is defined in AgentControl config as DEFAULT_AGENT_MAX_THREADS = Some(6).
The Guards instance is shared across all AgentControl clones within the same session. This means the limit is session-scoped: all agents spawned by a parent session share the same concurrency budget.
Recursion Depth Limit
Section titled “Recursion Depth Limit”File: codex-rs/core/src/agent/guards.rs:24
pub(crate) const MAX_THREAD_SPAWN_DEPTH: i32 = 1;
pub(crate) fn exceeds_thread_spawn_depth_limit(depth: i32) -> bool { depth > MAX_THREAD_SPAWN_DEPTH}MAX_THREAD_SPAWN_DEPTH = 1 means the hierarchy is at most two levels deep: the root session and its direct children. Grandchildren cannot spawn more agents. The depth is tracked in SessionSource::SubAgent(SubAgentSource::ThreadSpawn { depth, .. }) and incremented at each spawn level.
This prevents runaway recursion. A depth-one limit is a strong constraint — it means swarms are flat, not hierarchical. If an agent in the swarm needs to do its own sub-delegation, it cannot. This trades flexibility for predictability.
Parallel Wait: wait Tool
Section titled “Parallel Wait: wait Tool”File: codex-rs/core/src/tools/handlers/multi_agents.rs
The wait tool is the key coordination primitive for swarms. It takes a list of thread IDs and blocks until one of them reaches a terminal state:
#[derive(Debug, Deserialize)]struct WaitArgs { ids: Vec<String>, // Thread IDs to wait on timeout_ms: Option<i64>,}Timeout constraints are enforced server-side:
pub(crate) const MIN_WAIT_TIMEOUT_MS: i64 = 10_000; // 10 secondspub(crate) const DEFAULT_WAIT_TIMEOUT_MS: i64 = 30_000; // 30 secondspub(crate) const MAX_WAIT_TIMEOUT_MS: i64 = 300_000; // 5 minutesThe minimum of 10 seconds is intentional. The comment in the source notes: “Very short timeouts encourage busy-polling loops in the orchestrator prompt and can cause high CPU usage.”
The wait implementation uses FuturesUnordered to subscribe to status watch channels for all requested agents in parallel:
// Subscribe to status for each agentlet mut status_rxs = Vec::with_capacity(receiver_thread_ids.len());for id in &receiver_thread_ids { match session.services.agent_control.subscribe_status(*id).await { Ok(rx) => { let status = rx.borrow().clone(); if is_final(&status) { initial_final_statuses.push((*id, status)); } status_rxs.push((*id, rx)); } Err(_) => { /* agent not found, treat as done */ } }}
// Race across all status watcherslet mut futures = FuturesUnordered::new();for (id, rx) in status_rxs.into_iter() { futures.push(wait_for_final_status(session.clone(), id, rx));}
let deadline = Instant::now() + Duration::from_millis(timeout_ms as u64);loop { match timeout_at(deadline, futures.next()).await { Ok(Some(Some(result))) => { results.push(result); break; // Returns on FIRST completed agent } Ok(Some(None)) => continue, Ok(None) | Err(_) => break, }}Each wait_for_final_status() future subscribes to the tokio watch::Receiver<AgentStatus> for its agent. When the status transitions to a terminal state (Completed, Errored, Shutdown), the future resolves. The parent agent receives the first completion and can inspect the result.
This is an OR-wait, not an AND-wait: the wait tool returns when the first agent finishes, not when all finish. To wait for all agents, the orchestrator calls wait in a loop, processing one result at a time.
Agent Status Propagation
Section titled “Agent Status Propagation”File: codex-rs/core/src/agent/status.rs
Status transitions are driven by the event stream:
pub(crate) fn agent_status_from_event(msg: &EventMsg) -> Option<AgentStatus> { match msg { EventMsg::TurnStarted(_) => Some(AgentStatus::Running), EventMsg::TurnComplete(e) => Some(AgentStatus::Completed(e.last_agent_message.clone())), EventMsg::TurnAborted(e) => Some(AgentStatus::Errored(format!("{:?}", e.reason))), EventMsg::Error(e) => Some(AgentStatus::Errored(e.message.clone())), EventMsg::ShutdownComplete => Some(AgentStatus::Shutdown), _ => None, }}AgentStatus values: PendingInit, Running, Completed(String), Errored(String), Shutdown, NotFound.
The Completed variant carries the last assistant message from the agent’s turn. This is the agent’s “return value” — the orchestrator reads it via the wait result to understand what the child agent accomplished.
Swarm Event Lifecycle
Section titled “Swarm Event Lifecycle”When an agent spawns children, Codex emits dedicated events for observability:
CollabAgentSpawnBeginEvent { thread_id, task_description }CollabAgentSpawnEndEvent { thread_id, initial_status }CollabWaitingBeginEvent { ids, timeout_ms }These appear in the TUI as nested status indicators under the parent turn, giving the user visibility into which child agents are running, completed, or errored.
File Conflict Handling
Section titled “File Conflict Handling”Codex does not implement file-level locking or conflict detection for concurrent agents. Each agent runs in its own session context and makes independent file modifications. If two agents edit the same file simultaneously, the last writer wins.
The practical mitigation is prompt design: the orchestrator is expected to partition work such that agents do not overlap on files. The MAX_THREAD_SPAWN_DEPTH = 1 limit reduces the risk by constraining the hierarchy to one level where coordination is more tractable.
OpenCode Implementation
Section titled “OpenCode Implementation”OpenCode reference: commit 7ed449974864361bad2c1f1405769fd2c2fcdf42
OpenCode’s subagent mechanism (the task tool in packages/opencode/src/tool/task.ts) is sequential per invocation — each task tool call creates one child session and awaits its completion before the parent turn continues. There is no native parallel spawn-and-wait mechanism.
The general Agent’s Claimed Parallelism
Section titled “The general Agent’s Claimed Parallelism”The general built-in agent has this description (packages/opencode/src/agent/agent.ts:112):
"Use this agent to execute multiple units of work in parallel."This is a prompt-level affordance, not an implementation feature. The description instructs the LLM to invoke multiple task tool calls, and the Vercel AI SDK will dispatch concurrent tool calls if the LLM emits them in a single response. Whether the LLM actually does this depends on the model and the turn.
The plan agent also references parallel execution in its system prompt, describing a Phase 1 that launches “up to 3 explore agents IN PARALLEL.” Again, this is a prompting strategy: the LLM is expected to emit 3 task invocations in one response.
Concurrent Tool Dispatch
Section titled “Concurrent Tool Dispatch”When the LLM emits multiple tool calls in a single response turn, OpenCode’s session processor (packages/opencode/src/session/prompt.ts) collects them and processes them. The Vercel AI SDK’s fullStream handles concurrent tool dispatch at the API level. In practice, because task tool execution involves creating a child session, spawning its agent loop, and awaiting its session.completed bus event, multiple concurrent task calls would run as concurrent async operations within the event loop.
The limitation is that there is no explicit orchestration layer: no wait tool, no status polling API, and no per-child cancellation. The parent either awaits all children or none.
Session Isolation
Section titled “Session Isolation”Child sessions created by the task tool have their own SQLite session record with parentID set to the invoking session’s ID. Each child has its own message history and runs its own agent loop independently. File operations go through the same apply_patch / write / multiedit tools, but each child session has its own permission grant history.
No file-level coordination exists. Concurrent child sessions can overwrite each other’s file changes. This is expected to be managed by the orchestrating LLM through work partitioning.
Claude Code Implementation
Section titled “Claude Code Implementation”Claude Code is closed-source, so architecture is inferred from public documentation. Claude Code has two distinct multi-agent mechanisms: subagents (documented in Run 3, see Subagents) and agent teams (documented here). Subagents are spawned within a session and report back. Agent teams are fully independent sessions with shared coordination.
Agent Teams Architecture
Section titled “Agent Teams Architecture”Agent teams are experimental (disabled by default, CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1). They represent a fundamentally different coordination model from Codex’s spawn/wait primitives or OpenCode’s prompt-level parallelism.
Components:
| Component | Role | Storage |
|---|---|---|
| Team lead | Main Claude Code session. Creates team, spawns teammates, coordinates work, synthesizes results. | Session transcript |
| Teammates | Separate Claude Code instances with independent context windows. Each loads project context (CLAUDE.md, MCP, skills) + spawn prompt. Lead’s conversation history does NOT carry over. | Session transcript |
| Task list | Shared work items with three states (pending, in_progress, completed) and dependency tracking. Tasks auto-unblock when dependencies complete. | ~/.claude/tasks/{team-name}/ |
| Mailbox | Direct messaging system. message (one-to-one) and broadcast (one-to-all). Automatic delivery, no polling. | In-memory |
Team configuration stored at ~/.claude/teams/{team-name}/config.json with a members array (name, agent ID, agent type). Teammates can read this to discover other team members.
Coordination Model
Section titled “Coordination Model”Agent teams coordinate via a shared task list pattern, fundamentally different from Codex’s spawn/wait/status pattern:
| Mechanism | Codex Swarm | Claude Code Agent Teams |
|---|---|---|
| Work distribution | Orchestrator spawns agents with specific tasks | Lead creates task list, teammates self-claim or get assigned |
| Coordination primitive | wait tool (blocks until agent completes) | Task list + direct messaging |
| Result collection | wait returns agent’s last message | Teammates report via messages, lead synthesizes |
| Inter-agent communication | None (agents only communicate with parent) | Direct teammate-to-teammate messaging |
| Depth | 1-level (hardcoded) | Flat (no nested teams, no teammate sub-teams) |
| Concurrency control | Atomic Guards counter, DEFAULT_AGENT_MAX_THREADS = 6 | Team size set at creation, no global cap documented |
Task List Mechanics
Section titled “Task List Mechanics”Tasks have three states: pending, in_progress, completed. Tasks can depend on other tasks: a pending task with unresolved dependencies cannot be claimed until those dependencies complete. Task claiming uses file locking to prevent race conditions when multiple teammates try to claim simultaneously.
Work assignment follows two patterns:
- Lead assigns: Lead explicitly assigns tasks to specific teammates
- Self-claim: After finishing a task, teammates pick up the next unassigned, unblocked task
Quality Gate Hooks
Section titled “Quality Gate Hooks”Two specialized hooks enforce quality during team execution:
| Hook | Trigger | Exit Code 2 Effect |
|---|---|---|
TeammateIdle | Teammate is about to go idle | Send feedback, keep teammate working |
TaskCompleted | Task is being marked complete | Prevent completion, send feedback |
These hooks enable automated quality checks: CI validation before task completion, code review requirements, test coverage thresholds.
Plan Approval Flow
Section titled “Plan Approval Flow”Teammates can be required to plan before implementing:
- Teammate enters read-only plan mode
- Teammate completes plan, sends approval request to lead
- Lead reviews and approves or rejects with feedback
- If rejected: teammate revises and resubmits
- If approved: teammate exits plan mode, begins implementation
The lead makes approval decisions autonomously. Influence via prompt: “only approve plans that include test coverage.”
Display Modes
Section titled “Display Modes”| Mode | Mechanism | Terminal Requirement |
|---|---|---|
| In-process (default) | All teammates in main terminal. Shift+Down cycles. | Any terminal |
| Split panes | Each teammate in own pane. Click to interact. | tmux or iTerm2 |
| Auto | Split panes if in tmux session, in-process otherwise | — |
Configure via "teammateMode" in settings.json or --teammate-mode CLI flag.
Permission Model
Section titled “Permission Model”- All teammates start with the lead’s permission settings
- If lead uses
--dangerously-skip-permissions, all teammates inherit it - Individual modes can be changed after spawning
- Per-teammate modes cannot be set at spawn time
Limitations
Section titled “Limitations”- No session resumption:
/resumeand/rewinddo not restore in-process teammates - Task status lag: Teammates sometimes fail to mark tasks as completed
- Slow shutdown: Teammates finish current request/tool call before stopping
- One team per session: Must clean up before starting new team
- No nested teams: Teammates cannot spawn their own teams
- Fixed lead: Cannot promote a teammate or transfer leadership
- Permissions fixed at spawn: Start with lead’s mode, adjustable after but not before
- Split pane restrictions: Not supported in VS Code terminal, Windows Terminal, Ghostty
Pitfalls & Hard Lessons
Section titled “Pitfalls & Hard Lessons”Prompt-Level Parallelism Is Fragile
Section titled “Prompt-Level Parallelism Is Fragile”OpenCode’s strategy of telling the LLM to “run agents in parallel” via system prompt is unreliable. Models that have not been trained on multi-agent orchestration patterns tend to serialize their task invocations even when instructed to run in parallel. The only reliable parallelism guarantee comes from tool-level concurrency enforcement, as Codex does.
Depth Limits Must Be Enforced in Code, Not Prompt
Section titled “Depth Limits Must Be Enforced in Code, Not Prompt”Codex’s MAX_THREAD_SPAWN_DEPTH = 1 is enforced at the spawn_agent handler level. If it were only a prompt instruction (“do not spawn grandchildren”), models would violate it under adversarial or confused conditions. Hard enforcement in the spawn handler prevents infinite recursion regardless of what the LLM requests.
OR-Wait Semantics Create Batch Complexity
Section titled “OR-Wait Semantics Create Batch Complexity”Codex’s wait tool returns on the first completion. To gather results from N agents, the orchestrator must call wait N times in a loop. This is more token-efficient than waiting for all agents (the orchestrator can react early if one agent fails) but it requires the LLM to maintain a list of pending thread IDs across multiple turns. Models that lose track of this list leave ghost agents running until timeout.
Session-Level Concurrency Budget, Not Global
Section titled “Session-Level Concurrency Budget, Not Global”Codex’s Guards counter is scoped to a session, not to the process. If a user has multiple top-level sessions open simultaneously, each gets its own DEFAULT_AGENT_MAX_THREADS = 6 budget. System-wide agent count can exceed 6. There is no global cap.
Result Aggregation Requires Context Window Awareness
Section titled “Result Aggregation Requires Context Window Awareness”The orchestrator must receive and integrate results from potentially many child agents. Each Completed status carries only last_agent_message — the final assistant message from the child’s last turn. If the child’s work product is large (a generated file, a detailed analysis), it must be written to disk and the orchestrator must read it. Passing large results through the status message string is not supported and would hit context limits anyway.
Cancellation Does Not Roll Back File Changes
Section titled “Cancellation Does Not Roll Back File Changes”If an agent in a swarm is cancelled (via close_agent or CancellationToken), its file modifications are not reverted. The ghost commit system (git/ghost-commits.md) provides per-turn snapshots, but partial-turn cancellation leaves the filesystem in whatever state the agent reached before cancellation. Rollback requires manual intervention.
No File-Level Locking Anywhere
Section titled “No File-Level Locking Anywhere”None of the three reference implementations lock files before writing. The assumption is that the orchestrator partitions work correctly. In practice, agents frequently race on shared files (package.json, configuration files, shared utilities) because the LLM’s work partitioning is imperfect. The solution is to either use workspace-level subagents (each in a git worktree) or to accept that file conflicts will occasionally require human resolution.
Shared Task List Coordination Has Overhead
Section titled “Shared Task List Coordination Has Overhead”Claude Code’s agent team task list with file-locking prevents race conditions but adds filesystem I/O overhead per claim. In Codex’s model, the orchestrator directly assigns work via spawn_agent — no shared state to contend over. For small swarms (2-4 agents), direct assignment is simpler. Shared task lists become valuable at larger scales where the orchestrator cannot efficiently track all assignments.
Direct Messaging Creates Token Amplification
Section titled “Direct Messaging Creates Token Amplification”Agent teams allow teammates to message each other directly. Each message consumes tokens in both the sender’s and receiver’s context windows. A team of N agents with frequent broadcast messages faces O(N) token amplification per broadcast. Codex’s model, where agents only communicate with the parent, keeps communication costs linear.
Quality Gate Hooks Are Post-Hoc
Section titled “Quality Gate Hooks Are Post-Hoc”The TaskCompleted hook fires when a task is being marked complete, not during execution. If a teammate spends significant tokens on an approach that fails the quality gate, those tokens are wasted. Pre-execution plan approval (Claude Code’s plan approval flow) is more token-efficient for risky tasks but adds latency.
Agent Teams Are Sessions, Not Threads
Section titled “Agent Teams Are Sessions, Not Threads”Unlike Codex’s swarm (tokio tasks within a single process), Claude Code’s agent teams are full Claude Code session processes. Each teammate is a separate Claude instance with its own context window. This means: separate API connections, separate rate limit consumption, separate compaction timelines. Token cost scales linearly with team size, not just compute cost.
OpenOxide Blueprint
Section titled “OpenOxide Blueprint”Architecture: Two-Layer Swarm
Section titled “Architecture: Two-Layer Swarm”Layer 1 is the spawn layer — async agent creation with atomic concurrency guards, identical to Codex’s Guards design. Layer 2 is the coordination layer — explicit wait primitive with configurable OR/AND semantics, timeout enforcement, and status broadcasting.
Swarm Tool Set
Section titled “Swarm Tool Set”pub enum SwarmTool { SpawnAgent { task: String, model: Option<String>, max_tokens: Option<usize>, }, WaitAgent { ids: Vec<AgentId>, mode: WaitMode, // Any (first completion) | All (all complete) timeout_ms: u64, }, SendInput { id: AgentId, message: String, }, CloseAgent { id: AgentId, },}
pub enum WaitMode { Any, // Return when first agent completes (Codex behavior) All, // Return when all agents complete}WaitMode::All is an addition over Codex’s OR-only semantics. It simplifies the common “fan-out, gather-all” pattern by eliminating the LLM’s need to maintain a pending ID list.
Concurrency Guard
Section titled “Concurrency Guard”pub struct AgentGuard { active: Arc<AtomicUsize>, max: usize,}
impl AgentGuard { pub fn try_acquire(&self) -> Option<AgentSlot> { let current = self.active.load(Ordering::Relaxed); if current >= self.max { return None; } match self.active.compare_exchange_weak( current, current + 1, Ordering::AcqRel, Ordering::Relaxed ) { Ok(_) => Some(AgentSlot { guard: Arc::clone(&self.active) }), Err(_) => None, // Retry at call site if needed } }}
pub struct AgentSlot { guard: Arc<AtomicUsize>,}
impl Drop for AgentSlot { fn drop(&mut self) { self.guard.fetch_sub(1, Ordering::AcqRel); }}Depth Limit
Section titled “Depth Limit”Codex’s MAX_THREAD_SPAWN_DEPTH = 1 is the right call for an initial implementation. Start with depth 1, expose it as a config field, and increase it only after swarm stability is proven:
pub struct SwarmConfig { pub max_concurrent_agents: usize, // default 6 pub max_spawn_depth: u32, // default 1 pub wait_timeout_min_ms: u64, // default 10_000 pub wait_timeout_max_ms: u64, // default 300_000}Status Broadcasting
Section titled “Status Broadcasting”Use tokio watch channels as Codex does. Each spawned agent has a watch::Sender<AgentStatus>. The parent subscribes via watch::Receiver through the agent registry. WaitMode::All collects receivers for all requested IDs and uses FuturesUnordered to drive all of them concurrently:
pub async fn wait_all( receivers: Vec<(AgentId, watch::Receiver<AgentStatus>)>, deadline: Instant,) -> Vec<(AgentId, AgentStatus)> { let mut futs = FuturesUnordered::new(); for (id, rx) in receivers { futs.push(wait_for_terminal(id, rx)); } let mut results = vec![]; loop { match timeout_at(deadline, futs.next()).await { Ok(Some(r)) => results.push(r), Ok(None) | Err(_) => break, } } results}Crates
Section titled “Crates”[dependencies]tokio = { version = "1", features = ["full"] }uuid = { version = "1", features = ["v4"] }No external crates beyond what the agent loop already uses.
Key Design Decisions
Section titled “Key Design Decisions”- Flat depth, not hierarchical: Start with
max_spawn_depth = 1. Hierarchical swarms add coordination complexity that is not well-studied in practice. - AND-wait in addition to OR-wait: The gather-all pattern is common enough to be a first-class operation, not a loop the LLM must implement.
- Timeout floor at 10 seconds: Prevents models from polling at sub-second intervals, which wastes tokens and causes thrashing.
- Result via last message: The agent’s final assistant message is the result. Large work products go to disk; the message reports their location.
- No file locking: Trust work partitioning. Add file-locking only if real-world usage shows frequent conflicts.
Agent Team Layer (Future)
Section titled “Agent Team Layer (Future)”Claude Code’s agent teams represent a higher-level orchestration model built on top of the subagent/swarm primitives. If OpenOxide adds team support, it would be a separate layer:
pub struct AgentTeam { lead: SessionId, members: Vec<TeamMember>, task_store: Arc<TaskStore>, mailbox: Arc<Mailbox>, config_path: PathBuf, // ~/.openoxide/teams/{name}/config.json}
pub struct TeamMember { name: String, agent_id: AgentId, session: SessionHandle, // Handle to independent openoxide process/session}
pub struct TaskStore { tasks: RwLock<Vec<TeamTask>>, lock_dir: PathBuf, // File-based locking for concurrent claims}
pub struct TeamTask { id: TaskId, status: TaskStatus, // Pending, InProgress(AgentId), Completed depends_on: Vec<TaskId>, assigned_to: Option<AgentId>,}
pub struct Mailbox { channels: HashMap<AgentId, mpsc::Sender<TeamMessage>>,}
pub enum TeamMessage { Direct { from: AgentId, content: String }, Broadcast { from: AgentId, content: String }, Shutdown, PlanApproval { plan: String }, PlanDecision { approved: bool, feedback: Option<String> },}Key design decisions for teams layer:
- Teams are process-level, swarms are task-level. The existing swarm (
SpawnAgent/WaitAgent) operates within a single session as tokio tasks. Teams would spawn separateopenoxideprocesses, each with its own context window. - Shared task store with file locking. Follow Claude Code’s approach. Use
flock()on Unix for atomic task claiming. Store tasks in~/.openoxide/tasks/{team-name}/. - Quality gate hooks. Add
TeammateIdleandTaskCompletedas new hook events alongside the existing 13 OpenOxide events. - Plan approval flow. Reuse the existing plan mode infrastructure. Add a message type for plan submission/approval between team members.
- Display modes deferred. In-process display (cycling between teammates in one terminal) requires TUI refactoring. Start with split-pane mode only (tmux integration or separate terminals).