Why Every Voice App Wastes 200ms in Your Browser
We built a Chrome extension that delivers voice-transcribed text via WebSocket in 1-5ms — bypassing the clipboard paste that every other STT tool depends on.
Why Every Voice App Wastes 200ms in Your Browser
Every speech-to-text tool that delivers text to your browser — Voice In, Voicy, Speechnotes, all of them — hits the same wall. The transcription finishes. The text is ready. Then the app writes it to the clipboard and simulates Cmd+V. That keystroke-to-paste roundtrip costs ~200ms. Every single time.
We just shipped a Chrome extension that eliminates it entirely.
The Clipboard Tax
Here is the standard delivery path for voice text reaching a browser text field:
STT Engine → clipboard.writeText() → osascript Cmd+V → OS paste event → DOM update
That osascript call alone takes 80-150ms. Add clipboard write, the OS event loop, and the browser processing the paste event, and you are looking at ~200ms of dead time after inference is already complete. On a fast STT engine that transcribes in 237ms, this means the delivery overhead is nearly as long as the inference itself.
This is not a Callipso-specific problem. It is a platform limitation. Any macOS app that delivers text to a browser via the clipboard pays this tax. There is no way around it — unless you skip the clipboard entirely.
The Fast Path
Callipso Voice Bridge is a companion Chrome extension that receives transcribed text directly from the Callipso desktop app over a local WebSocket connection and inserts it into the focused element via document.execCommand('insertText').
STT Engine → WebSocket (ws://localhost:3000) → content script → execCommand('insertText')
No clipboard. No simulated keystrokes. No OS paste event. The text goes from our Electron process to the browser's DOM in 1-5ms.
The architecture is minimal:
| Component | Role |
|---|---|
ChromeExtensionBridge (Electron) | WebSocket server on /chrome path, single-connection model |
| Service worker (extension) | WebSocket client, relays messages to active tab |
| Content script (extension) | execCommand('insertText') with shadow DOM traversal |
If the extension is not connected or the insertion fails, the system falls back to the standard Cmd+V path automatically. The 500ms ack timeout ensures voice delivery is never blocked.
Does Anyone Else Do This?
We searched. No one does desktop-app-to-browser voice bridging over WebSocket. The landscape:
| Tool | Approach | Limitation |
|---|---|---|
| Voice In | Browser Web Speech API | STT runs in-browser, lower quality than local models |
| Voicy | Browser-based STT | Same — no desktop engine integration |
| Clipboard Inserter | Polls clipboard, auto-pastes | Crude, pollutes clipboard, no WebSocket |
| native-inserter | Chrome Native Messaging | Built for Japanese text extraction, not voice |
The browser-based STT extensions are limited to the Web Speech API or their own cloud models. They cannot tap into local CoreML or Parakeet inference running on your Neural Engine. Callipso runs STT locally with models that transcribe 8 seconds of speech in 237ms — then the extension handles the last mile.
What You Get
40x faster delivery. 1-5ms instead of ~200ms. The text appears the instant inference completes.
No clipboard pollution. Your clipboard stays exactly as it was. Copy something, dictate into a text field, paste — your original clipboard content is still there.
Works everywhere. GitHub issues, Google Docs, Gmail compose, Slack, Notion, Monaco editors, shadow DOM components. The content script traverses shadow roots and falls back gracefully for standard inputs and textareas.
Zero data off-machine. The WebSocket connection is ws://localhost:3000. No cloud relay, no analytics, no tracking. The extension's only job is to insert text it receives from your own desktop.
How Can I Get It?
Callipso Voice Bridge is live on the Chrome Web Store. Install it, launch Callipso, and voice text will route directly to your browser — no clipboard, no paste delay.
The 200ms clipboard tax was always an annoyance. Now it is optional.