Llama 3.1 8B Instruct
Specifications
- Input
- Output
- Context window
- 131K tokens
- Released
- Jul 2024
Performance
- Speed
- 60 t/s
- TTFT
- 494 ms
- Latency
- 157 ms
- Intelligence
- —
Pricing
- Input
- €0.15 per 1M tokens
- Output
- €0.15 per 1M tokens
About this model
Meta Llama 3.1 8B Instruct is an efficient multilingual instruction-tuned model optimized for dialogue and assistant use cases. With 8 billion parameters and 128K context length, it provides strong performance across general tasks, code generation, and multilingual understanding. Supports function calling and tool use with Grouped-Query Attention architecture. Ideal for deployment scenarios requiring lower compute resources while maintaining quality across English and 7 additional languages including German, French, Spanish, and Hindi.
Technical specifications
- Capabilities
- Input modalities
- Output modalities
- Reasoning
- No
Knowledge horizon
Knowledge cutoff Dec 2023
Released Jul 2024
Today
Training to release 7 mo Since release 23 mo
See also
Add Model to Comparison
Search for a model to add