Nemotron 3 Super 120B A12B FP8
Specifications
- Input
- Output
- Context window
- 262K tokens
- Released
- Mar 2026
Performance
- Speed
- 1 t/s
- TTFT
- 156 ms
- Latency
- —
- Intelligence
- —
Pricing
- Input
- €0.35 per 1M tokens
- Output
- €1.04 per 1M tokens
About this model
NVIDIA Nemotron 3 Super 120B A12B FP8 is a 120B parameter (12B active) LatentMixture-of-Experts hybrid model with Mamba-2, MoE and Multi-Token Prediction layers, supporting up to 1M tokens context. It achieves 94.73% on HMMT Feb25 (with tools) and 83.73% on MMLU‑Pro, and scores 73.88% on Arena‑Hard‑V2 (Hard Prompt). The model supports configurable reasoning via an enable_thinking flag, tool use, and structured output. It is available under the NVIDIA Nemotron Open Model License.
Technical specifications
- Capabilities
- Input modalities
- Output modalities
- Reasoning
- Hybrid Default off
Knowledge horizon
Knowledge cutoff Feb 2026
Released Mar 2026
Today
Training to release 1 mo Since release 2 mo
See also
Add Model to Comparison
Search for a model to add