GPT-OSS 120B
Specifications
- Input
- Output
- Context window
- 131K tokens
- Released
- Aug 2025
Performance
- Speed
- 752 t/s
- TTFT
- 738 ms
- Latency
- 4.6s
- Intelligence
- —
Pricing
- Input
- €0.22 per 1M tokens
- Output
- €0.66 per 1M tokens
About this model
GPT-OSS 120B is a powerful 117B parameter Mixture-of-Experts reasoning model with 5.1B active parameters, released under Apache 2.0. Features configurable reasoning effort (low/medium/high), full chain-of-thought visibility, and runs on a single 80GB GPU thanks to MXFP4 quantization. Native support for function calling, web browsing, Python code execution, and structured outputs. Designed for agentic tasks and complex reasoning with production-grade performance. Fully customizable for specialized use cases on single H100/MI300X.
Technical specifications
- Capabilities
- Input modalities
- Output modalities
- Reasoning
- Default on high
Knowledge horizon
Knowledge cutoff Jun 2024
Released Aug 2025
Today
Training to release 14 mo Since release 10 mo
See also
Add Model to Comparison
Search for a model to add