GLM-4.6
Specifications
- Input
- Output
- Context window
- 203K tokens
- Released
- Sep 2025
Performance
- Speed
- 21 t/s
- TTFT
- 929 ms
- Latency
- —
- Intelligence
- —
Pricing
- Input
- €0.46 per 1M tokens
- Output
- €2.02 per 1M tokens
About this model
GLM-4.6 is a frontier-scale 355B parameter Mixture-of-Experts model with a 200K context window and 128K output capability. MIT licensed, making it the only model in its class that enterprises can self-host and deeply customize. Dominates LiveCodeBench v6 (#1, 82.8%), HLE (#1), excels at AIME 2025 (#3, 93.9%) and Terminal-Bench (#3, 40.5%). Near parity with Claude Sonnet 4 (48.6% win rate) while dramatically outperforming other open-source baselines. Purpose-built for agentic workflows, real-world coding, and tool-augmented problem-solving. Supports native tool calling during inference for complex multi-step tasks.
Technical specifications
- Capabilities
- Input modalities
- Output modalities
- Reasoning
- Hybrid Default off
Knowledge horizon
Knowledge cutoff Aug 2025
Released Sep 2025
Today
Training to release 1 mo Since release 8 mo
See also
Add Model to Comparison
Search for a model to add