Back to Blog
engineeringvoice-routingperformanceannouncement

Why Every Voice App Wastes 200ms in Your Browser

We built a Chrome extension that delivers voice-transcribed text via WebSocket in 1-5ms — bypassing the clipboard paste that every other STT tool depends on.

Callipso TeamMarch 12, 20264 min read

Why Every Voice App Wastes 200ms in Your Browser

Every speech-to-text tool that delivers text to your browser — Voice In, Voicy, Speechnotes, all of them — hits the same wall. The transcription finishes. The text is ready. Then the app writes it to the clipboard and simulates Cmd+V. That keystroke-to-paste roundtrip costs ~200ms. Every single time.

We just shipped a Chrome extension that eliminates it entirely.

The Clipboard Tax

Here is the standard delivery path for voice text reaching a browser text field:

STT Engine → clipboard.writeText() → osascript Cmd+V → OS paste event → DOM update

That osascript call alone takes 80-150ms. Add clipboard write, the OS event loop, and the browser processing the paste event, and you are looking at ~200ms of dead time after inference is already complete. On a fast STT engine that transcribes in 237ms, this means the delivery overhead is nearly as long as the inference itself.

This is not a Callipso-specific problem. It is a platform limitation. Any macOS app that delivers text to a browser via the clipboard pays this tax. There is no way around it — unless you skip the clipboard entirely.

The Fast Path

Callipso Voice Bridge is a companion Chrome extension that receives transcribed text directly from the Callipso desktop app over a local WebSocket connection and inserts it into the focused element via document.execCommand('insertText').

STT Engine → WebSocket (ws://localhost:3000) → content script → execCommand('insertText')

No clipboard. No simulated keystrokes. No OS paste event. The text goes from our Electron process to the browser's DOM in 1-5ms.

The architecture is minimal:

ComponentRole
ChromeExtensionBridge (Electron)WebSocket server on /chrome path, single-connection model
Service worker (extension)WebSocket client, relays messages to active tab
Content script (extension)execCommand('insertText') with shadow DOM traversal

If the extension is not connected or the insertion fails, the system falls back to the standard Cmd+V path automatically. The 500ms ack timeout ensures voice delivery is never blocked.

Does Anyone Else Do This?

We searched. No one does desktop-app-to-browser voice bridging over WebSocket. The landscape:

ToolApproachLimitation
Voice InBrowser Web Speech APISTT runs in-browser, lower quality than local models
VoicyBrowser-based STTSame — no desktop engine integration
Clipboard InserterPolls clipboard, auto-pastesCrude, pollutes clipboard, no WebSocket
native-inserterChrome Native MessagingBuilt for Japanese text extraction, not voice

The browser-based STT extensions are limited to the Web Speech API or their own cloud models. They cannot tap into local CoreML or Parakeet inference running on your Neural Engine. Callipso runs STT locally with models that transcribe 8 seconds of speech in 237ms — then the extension handles the last mile.

What You Get

40x faster delivery. 1-5ms instead of ~200ms. The text appears the instant inference completes.

No clipboard pollution. Your clipboard stays exactly as it was. Copy something, dictate into a text field, paste — your original clipboard content is still there.

Works everywhere. GitHub issues, Google Docs, Gmail compose, Slack, Notion, Monaco editors, shadow DOM components. The content script traverses shadow roots and falls back gracefully for standard inputs and textareas.

Zero data off-machine. The WebSocket connection is ws://localhost:3000. No cloud relay, no analytics, no tracking. The extension's only job is to insert text it receives from your own desktop.

How Can I Get It?

Callipso Voice Bridge is live on the Chrome Web Store. Install it, launch Callipso, and voice text will route directly to your browser — no clipboard, no paste delay.

The 200ms clipboard tax was always an annoyance. Now it is optional.

Share: