Hermes 4 405B

by NousResearch

Specifications

Input
Output
Context window: 128K tokens
Released: Aug 2025

Performance

Speed: 41 t/s
TTFT: 258 ms
Latency: 132 ms
Intelligence: —

Pricing

Input: €0.95
Output: €2.85

About this model

NousResearch Hermes 4 405B is the flagship hybrid-mode reasoning model based on Meta's Llama-3.1-405B architecture. Trained on a massive ~60B token corpus with explicit <think> deliberation segments, it delivers frontier-level performance in math, code, STEM, logic, and creative tasks. Achieves SOTA on RefusalBench for helpful, uncensored responses aligned to user values. Supports advanced function calling, structured JSON outputs, and tool use with extreme steerability and reduced refusal rates.

Technical specifications

Capabilities
Input modalities
Output modalities
Reasoning: Hybrid Default off

Knowledge horizon

Released Aug 2025

Today

Since release 11 mo

Hermes 4 405B

Specifications

Performance

Pricing

About this model

Technical specifications

Knowledge horizon

See also

Kimi K2.6

Qwen 3.6 27B

Gemma 4 31B