Whisper Large V3
Specifications
- Input
- Output
- Context window
- —
- Released
- Nov 2023
Performance
- Speed
- —
- TTFT
- —
- Latency
- —
- Intelligence
- —
Pricing
- Input
- €0.00 per 1M tokens
- Output
- €0.00 per 1M tokens
About this model
OpenAI Whisper Large V3 is a state-of-the-art automatic speech recognition model with 1550M parameters supporting 99 languages. Achieves 10-20% WER reduction compared to V2, trained on 1M hours weakly labeled + 4M hours pseudo-labeled audio. Features 128 Mel frequency bins (increased from 80), improved robustness to accents and background noise, and new Cantonese language support. Supports speech transcription and speech-to-English translation with sentence and word-level timestamps. Optimized with torch.compile for 4.5x speedup. Ideal for accessibility tools, multilingual transcription, and enterprise ASR applications.
Technical specifications
- Capabilities
- Input modalities
- Output modalities
- Reasoning
- No
Knowledge horizon
Released Nov 2023
Today
Since release 31 mo
See also
Add Model to Comparison
Search for a model to add