DeepSeek V4 Flash
Specifications
- Input
- Output
- Context window
- 1M tokens
- Released
- Apr 2026
Performance
- Speed
- 93 t/s
- TTFT
- 1.3s
- Latency
- 333 ms
- Intelligence
- —
Pricing
- Input
- €0.15 per 1M tokens
- Output
- €0.30 per 1M tokens
€0.04 cache hit
About this model
DeepSeek V4 Flash is a 284 B parameter Mixture-of-Experts (MoE) chat model from DeepSeek AI. It features a hybrid attention architecture with compressed sparse and heavily compressed attention, supporting a 1 million token context window. The model achieves 88.7 % EM on MMLU, 69.5 % Pass@1 on HumanEval, and 44.7 % EM on LongBench‑V2, demonstrating strong language, coding, and long‑context capabilities. It is released under the MIT License.
Technical specifications
- Capabilities
- Input modalities
- Output modalities
- Reasoning
- Hybrid Default on
Knowledge horizon
Released Apr 2026
Today
Since release 2 mo
See also
Add Model to Comparison
Search for a model to add