Local speech-to-text with speaker diarization. No cloud, no API keys, 3x faster than Whisper.
You want to transcribe meetings, podcasts, or voice notes β but:
- OpenAI Whisper API sends audio to the cloud (privacy risk, costs money)
- Whisper.cpp is fast but requires complex setup and manual compilation
- Proprietary tools lock you into subscriptions
WhisperBridge gives you enterprise-grade local transcription with a single pip install and one command.
- ποΈ Transcribe any audio format β mp3, wav, m4a, ogg, webm, flac
- π₯ Speaker diarization β knows who's talking (turns, segments)
- π Multilingual β 99 languages with auto-detection
- β‘ 3x faster than OpenAI Whisper β CTranslate2 optimized, runs on CPU or GPU
- π Word-level timestamps β precise text-audio alignment
- π§ Hotword boosting β improve accuracy on technical terms, names, product names
- π Multiple output formats β SRT subtitles, VTT, JSON, plain text
pip install whisperbridgeOr from source:
git clone https://github.com/AmSach/WhisperBridge.git
cd WhisperBridge
pip install -e .Requirements:
- Python 3.8+
- For GPU acceleration: CUDA 11.8+ (auto-detected)
# Transcribe a single file
whisperbridge transcribe meeting.mp3
# Speaker diarization (who said what)
whisperbridge transcribe interview.wav --diarize
# Specific language, faster model
whisperbridge transcribe lecture.m4a --lang en --model small
# Custom hotwords for better accuracy
whisperbridge transcribe tech-talk.mp3 --hotwords "Zo Computer,Nexus,GhostPilot"
# Output as subtitles
whisperbridge transcribe video.webm --format srt --output ./subs/whisperbridge transcribe <file> [options]
Options:
--model [tiny|small|medium|large] Whisper model size (default: medium)
--lang <code> Language code, e.g. en, fr, de (auto-detect if omitted)
--diarize Enable speaker diarization
--hotwords <words> Comma-separated words to boost
--format [txt|srt|vtt|json] Output format (default: txt)
--output <path> Output file/directory
--device [cpu|cuda] Compute device (auto-detect)
whisperbridge batch <folder> [options]
Transcribe all audio files in a folder with the same settings.
whisperbridge serve --port 8080
Start a local HTTP API server for transcription.
curl -X POST -F "audio=@recording.mp3" http://localhost:8080/transcribe
| Model | Speed (GPU) | Speed (CPU) | Accuracy |
|---|---|---|---|
| tiny | 30x real-time | 4x real-time | 85% |
| small | 15x real-time | 2x real-time | 92% |
| medium | 8x real-time | 0.8x real-time | 95% |
| large | 4x real-time | 0.4x real-time | 97% |
Tested on: NVIDIA RTX 3090, AMD Ryzen 5950X, Intel i9-13900K
whisperbridge/
βββ whisperbridge/ # Main package
β βββ __init__.py
β βββ cli.py # Click-based CLI
β βββ transcriber.py # Core transcription engine
β βββ diarizer.py # Speaker diarization (pyannote)
β βββ formats.py # Output format writers
βββ tests/
β βββ test_transcriber.py
β βββ test_cli.py
βββ requirements.txt
βββ setup.py
βββ README.md
Store API keys or model paths in ~/.whisperbridge.yaml:
model: small
device: cuda
output_format: txt
hotwords: []| Feature | WhisperBridge | Whisper.cpp | OpenAI API |
|---|---|---|---|
| Local processing | β | β | β |
| Speaker diarization | β | β | β |
| Hotword boosting | β | β | β |
| Python integration | β | β (C++) | β |
| No API key needed | β | β | β |
| Subtitle formats | β | β | β |
| One-line install | β | β | β |
MIT License β free for personal and commercial use.