Nemotron 3 Super 120B A12B FP8

by NVIDIA

Specifications

Input
Output
Context window: 262K tokens
Released: Mar 2026

Performance

Speed: 2 t/s
TTFT: 272 ms
Latency: 1.0s
Intelligence: —

Pricing

Input: €0.30
Output: €0.90

About this model

NVIDIA Nemotron 3 Super 120B A12B FP8 is a 120B parameter (12B active) LatentMixture-of-Experts hybrid model with Mamba-2, MoE and Multi-Token Prediction layers, supporting up to 1M tokens context. It achieves 94.73% on HMMT Feb25 (with tools) and 83.73% on MMLU‑Pro, and scores 73.88% on Arena‑Hard‑V2 (Hard Prompt). The model supports configurable reasoning via an enable_thinking flag, tool use, and structured output. It is available under the NVIDIA Nemotron Open Model License.

Technical specifications

Capabilities
Input modalities
Output modalities
Reasoning: Hybrid Default off

Knowledge horizon

Knowledge cutoff Feb 2026

Released Mar 2026

Today

Training to release 1 mo Since release 3 mo

Nemotron 3 Super 120B A12B FP8

Specifications

Performance

Pricing

About this model

Technical specifications

Knowledge horizon

See also

Kimi K2.5

Qwen 3.6 27B

Gemma 4 31B