The Agent Orchestration Landscape: Where Callipso Fits in March 2026
We mapped every AI coding agent orchestrator we could find — terminal multiplexers, IDE replacements, cloud platforms, voice controllers — and figured out where we stand.
The Agent Orchestration Landscape: Where Callipso Fits in March 2026
Six months ago, "AI coding agent" meant one thing: a single assistant in your editor. Today it means five Claude Code sessions rewriting your backend in parallel while you review diffs over coffee. The tooling around this shift is exploding — and the design choices diverge sharply.
We mapped every orchestrator we could find. Terminal multiplexers written in Go. Desktop apps that replace your IDE entirely. Cloud platforms running hundreds of agents headlessly. Voice controllers on iOS. We looked at what they build, what they skip, and what trade-offs they make.
This is not a product comparison — we are obviously biased. It is a landscape survey from the perspective of builders who had to make their own architectural bets. We will be honest about where those bets are ahead and where they are behind.
The Core Problem
The workflow is straightforward: you have three features to build, a bug to fix, and tests to write. You want five AI agents running simultaneously, each in its own git worktree, each making progress while you review.
The naive approach — five terminal tabs, manual branch management, copy-pasting prompts — breaks down fast. You lose track of which agent is doing what. Branches collide. You send a prompt to the wrong session. The cognitive load of managing the agents exceeds the work the agents are doing.
Every tool in this space exists to solve that problem. They disagree on how.
Three Philosophies
After studying the landscape, we see three distinct approaches. Each reflects a different bet about how developers will work with AI agents.
Philosophy 1: Replace the IDE
Intent (Augment Code), OpenAI's Codex App, and to some extent Conductor (Melty Labs) take the boldest position: build an entirely new environment designed around agent orchestration from the ground up.
Intent introduces a coordinator-implementor-verifier pipeline. A planning agent uses Augment's Context Engine to understand your task and propose a plan as a living specification. You review and approve. Then implementor agents execute in isolated git worktrees while a verifier agent checks results against the spec. When requirements change, updates propagate to all active agents. It is architecturally coherent — and it explicitly markets itself as an IDE replacement.
OpenAI's Codex App takes a similar approach. Threads of agents run in parallel, each with isolated code copies, managed through a purpose-built desktop surface. As of early March 2026, it runs on both macOS and Windows. Multi-agent workflows spawn specialized sub-agents, with Codex handling orchestration across threads.
Conductor from Melty Labs provides a dashboard of all agents — "who is working on what" — with real-time code review, trace logging, and automated PR creation. Each agent gets its own worktree. The free tier shifts costs to the AI services themselves.
The upside of this approach is deep integration. When you control the entire environment, specs can update automatically, agents share context natively, and the UI reflects agent state in real time.
The downside is lock-in. You abandon your existing workflow. Your Cursor keybindings, your Warp shortcuts, your iTerm2 profiles — gone. The switching cost is high when most developers have spent years customizing their environment.
Philosophy 2: Multiply Terminals
Claude Squad, Plural, and the Composio Agent Orchestrator take the opposite approach. They are lightweight multiplexers — spawn N agents, each in a tmux pane or its own worktree, manage them through a terminal UI or framework.
Claude Squad is the most popular, written in Go, with 14,000+ GitHub stars. It supports five different AI CLIs: Claude Code, Aider, Codex, OpenCode, and Amp. Each agent runs in a tmux session with its own git worktree. The TUI shows all sessions at a glance. Single binary, works anywhere tmux runs.
Superset launched in March 2026 as an open-source terminal for running 10+ parallel AI agents. Each session gets its own git worktree. A persistent daemon survives crashes and app restarts, so long-running tasks continue even when you close the window. It works with any CLI agent — Claude Code, Codex, Gemini CLI, OpenCode. Apache 2.0 license, zero telemetry.
Agent Deck is a Go + Bubble Tea TUI with a distinctive feature: Conductors. These are persistent Claude Code sessions that monitor and orchestrate all other sessions — watching for agents that need help, auto-responding when confident, escalating to you when they cannot. Socket pooling shares MCP processes across sessions via Unix sockets, reducing MCP memory usage by 85-90%. Optional Telegram and Slack integration for remote control.
ittybitty takes the simplest approach: specify a task and it spawns a Claude instance in a tmux virtual terminal. Agents can recursively spawn sub-agents. Minimal surface area — just a shell script and tmux.
myclaude treats Claude Code as an orchestration layer rather than the sole agent. It coordinates multiple AI backends — Codex, Gemini, OpenCode — from within Claude Code, keeping you in a single terminal context instead of switching between tools.
Plural focuses on forking sessions and broadcasting prompts across branches and repos. It can import issues from GitHub, Asana, and Linear, then dispatch them to parallel agents.
Composio Agent Orchestrator is a framework rather than a UI. It manages fleets of agents with a Planner layer (task decomposition) and an Executor layer (tool interaction). Each agent gets its own worktree, branch, and PR. When CI fails, the agent fixes it. When a reviewer requests changes, the agent addresses them.
The upside is low adoption friction. Claude Squad is a single Go binary. Superset is a single install with a daemon. No Electron, no server, no extension to install.
The downside is that these tools live entirely inside the terminal. If you are working in VS Code or Cursor, you context-switch between your IDE and the TUI. There is no awareness of which terminal belongs to which IDE window. The orchestrator cannot see your IDE terminals — it only knows about the tmux sessions it created.
Philosophy 3: Work With Everything
This is where Callipso sits. Instead of replacing your tools or living inside a terminal multiplexer, it floats as a transparent overlay on top of whatever you are already using.
The core idea: discover terminals across every IDE and terminal app on your machine and provide a unified control surface. Say "fix the auth bug, please one" and the transcribed text routes to terminal 1, regardless of whether it lives in Cursor, VS Code, Warp, or iTerm2.
No new IDE to learn. No tmux dependency. Your existing workflow stays intact. You get a command layer on top.
The Landscape in One Table
| Tool | Type | Agent Support | IDE Awareness | Voice | Worktrees | Cloud/Headless |
|---|---|---|---|---|---|---|
| Callipso | Desktop overlay | Claude Code | 7 platforms | Native STT | Yes | No |
| Intent | Desktop app | Claude Code, Codex, OpenCode | Own environment | No | Yes | No |
| Codex App | Desktop app | Codex only | Own environment | No | Yes | No |
| Conductor | Desktop app | Claude Code | Own UI | No | Yes | No |
| Warp Oz | Cloud platform | Any (via Skills) | Warp only | No | Cloud-based | Yes |
| Claude Squad | TUI (Go) | 5 CLIs | tmux only | No | Yes | No |
| Superset | Terminal + daemon | Any CLI agent | Own terminal | No | Yes | No |
| Agent Deck | TUI (Go) | Claude Code + more | tmux only | No | Yes | No |
| ittybitty | Shell script | Claude Code | tmux only | No | No | No |
| myclaude | Framework | Multi-backend | Claude Code CLI | No | No | No |
| Plural | TUI | Claude Code | tmux only | No | Yes | No |
| Composio | Framework | Any MCP client | Via MCP | No | Yes | Yes |
| VoxHerd | iOS app | Claude Code, Codex, Gemini CLI | By project | Native (on-device) | No | No |
Where Callipso Is Ahead
Cross-Platform Terminal Discovery
This is our strongest differentiator. Callipso discovers terminals across six platforms simultaneously:
IDE Adapters (via extensions on ports 3001-3003):
├── VS Code
├── Cursor
└── Windsurf
Terminal App Adapters (via native macOS APIs):
├── Warp (SQLite polling + CGEvent injection)
├── iTerm2 (event-driven API + AppleScript)
└── Terminal.app (AppleScript automation)
Built-in:
└── Embedded Terminal (xterm.js + node-pty)
Every competitor assumes a homogeneous environment. Claude Squad only sees tmux sessions. Intent only sees its own workspace. Codex App only runs its own agents. Warp Oz only runs in Warp. Superset runs agents in its own terminal. Agent Deck manages its own tmux panes.
Real developer setups are not homogeneous. You might use Cursor for the frontend, Warp for infrastructure scripts, and iTerm2 for quick debugging. Callipso sees all three, assigns a global terminal index, and routes to any of them through one interface. Nobody else does this.
The Adapter Problem: Ghostty vs iTerm2
The reason nobody else attempts cross-app terminal discovery is that every terminal application exposes a different level of programmatic control — or none at all. There is no standard API for "list your terminals." Each application requires its own adapter, and the engineering ranges from trivial to absurd depending on what the application gives you to work with.
The two extremes in our adapter set are iTerm2 (full API) and Ghostty (no API). The contrast illustrates why this problem is hard and why a generic approach does not exist.
iTerm2: The Gold Standard
iTerm2 ships a Python API with async event monitors. Callipso runs a long-lived Python daemon that maintains a persistent connection to iTerm2 via ~/.iterm2-socket:
Electron App (Node.js)
→ iTerm2Adapter (TypeScript)
→ iTerm2DaemonClient (process manager)
→ Python Daemon (iterm2_daemon.py)
→ iTerm2 Python API (iterm2 module)
→ iTerm2 Application via Unix socket
The daemon subscribes to native event monitors — FocusMonitor, SessionTerminationMonitor, EachSessionOnceMonitor, LayoutChangeMonitor. Events arrive immediately when sessions are created, destroyed, or focused. No polling lag. The daemon queries session variables (CWD, job name, PID, TTY) directly. Terminal IDs are native UUIDs in the format iterm2:{window}:{tab}:{session}, stable across focus changes and reorders.
Sending text is a single API call: session.async_send_text(text). Focusing a session: session.async_activate(). Creating a tab with a specific working directory: window.async_create_tab(profile_customizations). No focus stealing, no clipboard manipulation, no keystroke simulation.
If the Python daemon fails to spawn (wrong Python version, missing module), the adapter gracefully degrades to it2api, iTerm2's built-in JSON-RPC CLI tool. Same operations, polling-based instead of event-driven. The fallback chain ensures iTerm2 terminals are always discoverable.
Ghostty: Engineering Without an API
Ghostty, created by HashiCorp co-founder Mitchell Hashimoto, has no scripting API, no extension system, no socket, no RPC interface. Yet it is one of the fastest-growing terminal emulators — 2ms key-to-screen latency — and developers are adopting it rapidly. We had to support it.
The solution stitches together four macOS primitives:
GhosttyAdapter (AppAdapter interface)
→ GhosttyManager
├─ GhosttyPoller
│ ├─ Process tree scanning (ps + pgrep)
│ ├─ Accessibility window enumeration (System Events)
│ ├─ AX event watcher (compiled helper binary)
│ └─ Shell-to-tab matching (CWD heuristics)
│
└─ GhosttyClient
├─ ghostty-bridge (kernel TTY injection)
├─ Clipboard paste fallback (Cmd+V)
└─ AXPress tab switching (no focus steal)
Discovery: pgrep -x ghostty finds the main process. ps -e -o pid=,ppid=,tty=,command= builds the process tree. We walk descendants to find shell processes with real TTYs — each TTY is one terminal tab. Then the Accessibility API queries Ghostty's window hierarchy for tab counts, selected tab indices, and radio button labels (tab titles). The hard part is matching shells to visual tab positions: a three-pass heuristic compares CWD basenames against AX tab titles, then foreground process names, then falls back to PID order.
Control: A compiled ghostty-bridge binary writes directly to TTY devices using kernel injection — text appears in the terminal without Ghostty needing focus. For tab switching, the bridge sends AXPress on specific radio button elements identified by window position and tab index. Without the bridge, everything falls back to clipboard paste (Cmd+V) and keystroke simulation (Cmd+N for tab switching), which steals focus.
Event detection: A compiled ax-watcher helper binary observes Accessibility notifications (AXCreated, AXUIElementDestroyed, AXFocusedWindowChanged, AXSelectedChildrenChanged) and emits JSON events to stdout. The poller debounces these into immediate re-polls — sub-50ms detection latency, comparable to iTerm2's event-driven model.
The contrast:
| Capability | iTerm2 | Ghostty |
|---|---|---|
| Discovery | One API call: app.terminal_windows | Process tree + AX enumeration |
| Identity | Native UUID per session | TTY device name (ttys057) |
| Send text | session.async_send_text() | Kernel TTY injection or clipboard paste |
| Focus tab | session.async_activate() | AXPress or Cmd+N keystroke |
| Create tab | window.async_create_tab(cwd) | Cmd+T keystroke, then cd |
| Event detection | Native async monitors | Compiled AX watcher binary |
| Focus stealing | Never | Only in fallback mode |
| Split pane awareness | Full (sessions within tabs) | None |
| Latency (discovery) | under 10ms (event-driven) | ~100-300ms (AX query) |
Both adapters implement the same AppAdapter interface — poll(), sendText(), focusTerminal(), createTerminal(). The TerminalStoreManager does not know or care which adapter produced a terminal. A Ghostty terminal and an iTerm2 terminal are both TerminalEntity objects in the normalized store, routable by UUID.
This is why the adapter pattern matters. The complexity of controlling seven different terminal applications is contained within seven adapter implementations. The rest of the system — the store, the session manager, the voice router, the overlay UI — operates on a clean, uniform data model. Adding support for a new terminal app means writing one adapter, not modifying the core.
Voice Routing by Terminal Index
The voice routing model is a two-stage state machine:
Stage 1: Clipboard Detection
STT engine transcribes speech → text lands in clipboard
→ Callipso detects the change, stores text with 30-second expiry
Stage 2: Destination Selection
User presses Cmd+Opt+F1 (or F2, F3, F4)
→ Callipso routes stored text to that terminal index
State Machine:
IDLE → TEXT_WAITING (clipboard stored)
→ DESTINATION_WAITING (hotkey pressed)
→ ROUTE_READY (both exist) → IDLE (text sent)
The state machine handles timing edge cases: speaking before pressing the hotkey, pressing the hotkey before speaking, or pressing a new hotkey before the previous route completes. A sticky destination mode keeps the target terminal active for 20 seconds, so consecutive voice commands go to the same place without re-pressing the hotkey.
VoxHerd is the only other tool doing voice seriously — an iOS app that routes spoken commands to agents via WebSocket with on-device recognition. But it routes by project, not by terminal index. You cannot say "please three" to target a specific terminal in a specific IDE window. Warp supports Wispr Flow for voice input, but it is single-terminal — no cross-IDE routing.
Non-Invasive Overlay Model
This is a design philosophy, not a feature. Callipso does not replace your IDE. It does not ask you to learn a new environment. It does not require abandoning tool configurations you have built over years.
The overlay floats on top of your desktop with click-through transparency. Terminal list, task status, recording controls — all visible at a glance without switching windows. You interact with your existing tools exactly as before. Callipso adds a control layer; it does not replace anything.
Intent explicitly says "the IDE is dead." We disagree. The IDE is not dead — it is just missing a conductor.
Session-Isolated Routing
When multiple Claude Code sessions run in the same Cursor window, they share the same CLAUDE_CODE_SSE_PORT. Naive routing based on the port sends your prompt to the wrong session.
Callipso solves this with session_id — a UUID generated per Claude Code invocation via the hooks system. The SessionStore maintains bidirectional maps between SessionId and TerminalUUID. Routing uses session_id for disambiguation, eliminating cross-pollination entirely.
This is subtle infrastructure, but it is the difference between "parallel agents work sometimes" and "parallel agents work reliably." Tools that rely on tmux or their own terminal implementation sidestep this problem by design. Tools that need to work across existing IDE terminals — which only Callipso does — must solve it explicitly.
Real-Time 3D Codebase Visualization
The Space tab renders your codebase as a 3D star field using Rust, WebAssembly, and WebGPU. AI agents appear as ships navigating between files. When an agent reads a file, the ship moves toward it. When it writes, the node pulses.
No other orchestrator has visual debugging at this level. It is experimental, it is optional, and it is the feature that makes people stop and watch. The Space Tab: Behind the Scenes post covers the Rust/WASM architecture in detail.
Where Callipso Is Behind
We would rather be honest about our gaps than have users discover them.
Limited CLI Support
This is the most actionable gap. Callipso's deep session integration currently works only with Claude Code. We detect Claude Code sessions via hooks, register session_id mappings, and track task state per session.
Claude Squad supports five CLIs: Claude Code, Aider, Codex, OpenCode, and Amp. VoxHerd supports Claude Code, Codex, and Gemini CLI. Composio works with any MCP-compatible client.
Our adapter architecture is extensible — adding Gemini CLI or OpenCode detection is engineering work, not a design problem. But today, the gap exists. If you run a mixed fleet of agents across different CLIs, Callipso does not orchestrate all of them yet.
No Autonomous Planning Layer
Intent's coordinator-implementor-verifier pipeline is genuinely powerful for large tasks. You write a spec, the coordinator decomposes it, specialist agents execute in parallel, a verifier checks results. The spec updates as agents learn. It is a self-correcting system.
Composio's Planner-Executor architecture does something similar: high-level decomposition with correction loops when tools fail.
Callipso is a routing orchestrator. You decide what goes where. The system routes your voice, manages sessions, tracks state — but it does not decide how to decompose a large task into parallel subtasks. That planning remains with the human (or with the AI agent's own sub-agent spawning).
Both models have merit. Autonomous planning is more powerful for large, well-specified tasks. Human-in-the-loop routing is more flexible for exploratory work where the developer is actively steering. We chose the latter, and we think it is the right default for how most developers work today. But the planning layer is where Intent and Composio are ahead.
macOS Only
Callipso uses macOS-specific APIs extensively:
- AppleScript for iTerm2 and Terminal.app control
- CGEvent for Warp text injection
- NSWorkspace for Claude Desktop monitoring
- Accessibility APIs for terminal discovery
The Codex App already runs on macOS and Windows. Claude Squad is cross-platform (Go + tmux). Warp Oz is cloud-based. Our addressable market is limited to macOS developers — which is a large audience (most AI-first developers are on Macs) but not the full market.
No Cloud or Headless Mode
Warp's Oz runs agents in persistent cloud environments, triggered by cron jobs, webhooks, GitHub Actions, or Slack messages. Composio can manage 30+ agents headlessly. These tools enable team-scale automation where agents run overnight, process issue backlogs, and create PRs without a human at the keyboard.
Callipso requires an active desktop session. It is a developer productivity tool, not a CI/CD agent farm. For solo developers and small teams working interactively, this is not a limitation. For organizations wanting to scale agent execution across a team, it is.
No Automated CI Feedback Loop
Conductor and Composio close the loop: agent pushes code, CI fails, agent reads the failure log, agent fixes and re-pushes. This cycle can repeat autonomously until tests pass.
Callipso has git worktree support, commit buttons, and merge-to-main UI. But the CI-failure-to-fix cycle is still manual. You see the failure, you tell the agent to fix it. The routing is fast, but the human is still in the loop for error recovery.
The Cloud Orchestrators
Two tools deserve separate mention because they operate at a different scale.
Warp Oz is a cloud platform for running coding agents at enterprise scale. Agents run in persistent cloud environments with cron, webhook, and API triggers. Every agent run generates a shareable audit trail. Multi-repository support enables cross-repo changes. Self-hosted option for enterprises. This is not a developer tool — it is infrastructure.
GitHub Copilot Agent can be assigned GitHub issues directly. It spins up an ephemeral dev environment via GitHub Actions, implements the fix, and creates a PR. No local environment needed. This is the zero-setup end of the spectrum.
These tools solve a different problem than Callipso. They are for teams that want agents running autonomously at scale. Callipso is for developers who want to run agents interactively across their existing tools.
VS Code's Built-In Multi-Agent Mode
Worth noting: VS Code shipped multi-agent orchestration in February 2026. You can run multiple Copilot agents in the same editor, each working on different files, with a coordination layer that prevents conflicts.
This is native, free, and deeply integrated. For developers who live entirely in VS Code and use only Copilot, it may be sufficient. The limitation is scope: it only works within VS Code, only with Copilot, and only within a single editor window. Cross-IDE, cross-terminal, and cross-CLI orchestration are out of scope.
The Positioning Map
Autonomous ──────────────────────────── Human-in-Loop
│ │
│ Intent │
Replace │ Codex App │
IDE │ Conductor │
│ │
───────────────┼───────────────────────────────────────┼──────
│ │
Single │ Warp Oz Claude Squad │ Callipso
Platform │ GitHub Agent Plural │
│ Composio │
│ │
Multi-IDE ──────────────────────────── Multi-IDE
Callipso occupies a unique quadrant: multi-IDE support with human-in-the-loop voice routing. Most competitors are either multi-IDE but autonomous (Intent), or human-controlled but single-environment (Claude Squad, Conductor).
The question is whether that quadrant grows. We believe it does — because developers use multiple tools, and the tools that meet them where they are will win over the tools that ask them to move.
What We Are Watching
Three trends will shape this space over the next year:
Agent-to-agent communication. Claude Code shipped Agent Teams in February 2026 — teammates that communicate directly with each other via shared task lists and mailboxes. When agents coordinate without human mediation, the orchestrator's role shifts from routing to monitoring. The tools that adapt to this will win.
MCP as a standard. Model Context Protocol is becoming the lingua franca for agent-tool communication. Composio already supports any MCP client. As more CLIs implement MCP, the adapter-per-platform model (which Callipso uses today) may give way to a single MCP integration that works everywhere.
Enterprise vs. individual. The landscape is splitting. Warp Oz and GitHub Agent target teams with governance, audit trails, and cloud execution. Callipso, Claude Squad, and VoxHerd target individual developers who want more control. These are different markets with different needs, and tools that try to serve both will struggle.
Our Bet
We believe the orchestration layer should be additive, not replacive. Developers should not have to abandon their IDE to gain multi-agent capabilities. The orchestrator should meet them where they are — in Cursor, in Warp, in VS Code, in iTerm2 — and provide a thin, powerful control surface on top.
The tools that ask you to replace everything are betting that agent-first development is so different it needs a new paradigm. Maybe they are right eventually. But in March 2026, developers are still working in IDEs they have customized for years, still using terminals with specific configurations, still switching between tools that each do one thing well.
An overlay that connects all of them — with voice routing, session isolation, and real-time visibility — that is the bet we are making. Talk to your terminals by name. Watch your agents work. Keep everything else exactly as it is.