Command Palette
Search for a command to run

Nemotron 3 Super 120B A12B FP8

by NVIDIA

Specifications

Input
Output
Context window
262K tokens
Released
Mar 2026

Performance

Speed
1 t/s
TTFT
156 ms
Latency
Intelligence

Pricing

Input
€0.35
per 1M tokens
Output
€1.04
per 1M tokens

About this model

NVIDIA Nemotron 3 Super 120B A12B FP8 is a 120B parameter (12B active) LatentMixture-of-Experts hybrid model with Mamba-2, MoE and Multi-Token Prediction layers, supporting up to 1M tokens context. It achieves 94.73% on HMMT Feb25 (with tools) and 83.73% on MMLU‑Pro, and scores 73.88% on Arena‑Hard‑V2 (Hard Prompt). The model supports configurable reasoning via an enable_thinking flag, tool use, and structured output. It is available under the NVIDIA Nemotron Open Model License.

Technical specifications

Capabilities
Input modalities
Output modalities
Reasoning
Hybrid Default off

Knowledge horizon

Knowledge cutoff Feb 2026
Released Mar 2026
Today
Training to release 1 mo Since release 2 mo

See also

Add Model to Comparison
Search for a model to add
Command Palette
Search for a command to run