Hermes 4 405B
Specifications
- Input
- Output
- Context window
- 128K tokens
- Released
- Aug 2025
Performance
- Speed
- 41 t/s
- TTFT
- 224 ms
- Latency
- 228 ms
- Intelligence
- —
Pricing
- Input
- €0.95 per 1M tokens
- Output
- €2.85 per 1M tokens
About this model
NousResearch Hermes 4 405B is the flagship hybrid-mode reasoning model based on Meta's Llama-3.1-405B architecture. Trained on a massive ~60B token corpus with explicit <think> deliberation segments, it delivers frontier-level performance in math, code, STEM, logic, and creative tasks. Achieves SOTA on RefusalBench for helpful, uncensored responses aligned to user values. Supports advanced function calling, structured JSON outputs, and tool use with extreme steerability and reduced refusal rates.
Technical specifications
- Capabilities
- Input modalities
- Output modalities
- Reasoning
- Hybrid Default off
Knowledge horizon
Released Aug 2025
Today
Since release 10 mo
See also
Add Model to Comparison
Search for a model to add