Discover AI models for every task
Showing 1-24 of 73 models
by Moonshot
Moonshot AI's most powerful native multimodal agentic model. Features 1T parameters (32B activated), 256K context, vision capabilities, and advanced reasoning with agent swarm support.
by MiniMax
MiniMax M2.5 is a state-of-the-art reasoning MoE model with 229B total / 10B active parameters. Extensively trained with reinforcement learning across 200,000+ real-world environments, achieving SOTA performance in coding (80.2% SWE-Bench Verified), agentic tool use, search, and office productivity tasks. Features 197K context window, efficient MoE inference, and strong multilingual support.
by Google
Gemma 3 27B IT is a cutting-edge multimodal vision-language model with 27 billion parameters, built on Gemini technology. Trained on 14 trillion tokens, it handles both text and image inputs with a 128K context window and supports 140+ languages. Excels at visual understanding, code generation, mathematical reasoning, and multilingual tasks. Achieves 78.6 on MMLU, 82.6 on GSM8K, 85.6 on DocVQA, and 76.3 on ChartQA. Lightweight enough for laptop deployment with strong safety improvements over previous Gemma versions.
by OpenAI
GPT-OSS 120B is a powerful 117B parameter Mixture-of-Experts reasoning model with 5.1B active parameters, released under Apache 2.0. Features configurable reasoning effort (low/medium/high), full chain-of-thought visibility, and runs on a single 80GB GPU thanks to MXFP4 quantization. Native support for function calling, web browsing, Python code execution, and structured outputs. Designed for agentic tasks and complex reasoning with production-grade performance. Fully customizable for specialized use cases on single H100/MI300X.
by ZAI
ZAI's frontier 744B MoE model (40B activated) with 203K context. Excels at agentic engineering, coding (SWE-bench 77.8%), reasoning, and tool use. Built with asynchronous RL and MIT licensed.
by Deepseek
DeepSeek R1 0528 is an upgraded 685B parameter reasoning model with significantly enhanced depth of reasoning and inference capabilities. Achieves 87.5% on AIME 2025 (up from 70%), 91.4% on AIME 2024, 73.3% on LiveCodeBench, and 1930 Codeforces rating. Features system prompt support, averages 23K thinking tokens per question for deeper analysis, and reduced hallucination rate. Released under MIT license supporting commercial use and distillation. Performance approaching O3 and Gemini 2.5 Pro levels.
by Black Forest Labs
Black Forest Labs FLUX.2 [klein] 9B is a balanced image generation model offering excellent quality-to-speed ratio. With 9 billion parameters, it provides better detail and composition than the 4B variant while remaining faster than full-size models. Ideal for production workloads requiring a balance between quality, speed, and cost. Supports both text-to-image and image-to-image generation.
GPT-OSS 20B is a compact 21B parameter Mixture-of-Experts model with 3.6B active parameters, designed for lower latency and local deployment. Runs within 16GB memory with configurable reasoning effort, full chain-of-thought access, and native agentic capabilities including function calling and structured outputs. Released under Apache 2.0 license, ideal for specialized fine-tuning on consumer hardware. Companion model to GPT-OSS 120B optimized for speed while maintaining strong reasoning capabilities.
OpenAI Whisper Large V3 is a state-of-the-art automatic speech recognition model with 1550M parameters supporting 99 languages. Achieves 10-20% WER reduction compared to V2, trained on 1M hours weakly labeled + 4M hours pseudo-labeled audio. Features 128 Mel frequency bins (increased from 80), improved robustness to accents and background noise, and new Cantonese language support. Supports speech transcription and speech-to-English translation with sentence and word-level timestamps. Optimized with torch.compile for 4.5x speedup. Ideal for accessibility tools, multilingual transcription, and enterprise ASR applications.
by Mistral
Devstral 2 123B is Mistral AI's flagship agentic coding model, featuring 123B parameters optimized for software engineering tasks. Achieves 72.2% on SWE-bench Verified and 61.3% on SWE-bench Multilingual. Excels at codebase exploration, multi-file editing, and agentic workflows with tool use. Supports 200K context window with enhanced function calling and structured output. Designed for IDE integration via Mistral Vibe CLI. Released under modified MIT license for unrestricted commercial use.
Mistral's 12B parameter vision-language model. Capable of understanding and reasoning about images alongside text.
Mistral Small 3.2 24B Instruct is a multimodal instruction-tuned model supporting both vision and text with 24B parameters and 128K context. Major improvements over 3.1 include better instruction following (84.78%), 2x reduction in repetition errors, and robust function calling. Achieves 65.33% on Wildbench v2, 43.1% on Arena Hard v2, 92.90% on HumanEval Pass@5. Vision benchmarks: 87.4% ChartQA, 94.86% DocVQA, 62.50% MMMU. Supports up to 10 images per prompt with integrated vision-based function calling.
Mistral Small 4 is a 119B-parameter Mixture-of-Experts model (128 experts, 4 active per token, 6.5B active parameters) that unifies instruct, reasoning, and coding capabilities into a single multimodal model. It accepts text and image inputs, supports function calling, structured outputs, and configurable reasoning effort (none for fast responses, high for deep step-by-step reasoning). With a 256K context window and Apache 2.0 license, it delivers 40% lower latency and 3x higher throughput compared to Mistral Small 3.
Mistral Voxtral Small 24B is a multimodal model supporting both text and audio inputs with 24B parameters. Enables natural voice conversations and audio understanding alongside text processing. Features audio transcription, audio-based reasoning, and voice-to-text capabilities. Built on Mistral architecture with specific training for audio modalities. Ideal for voice assistants, audio analysis applications, and multimodal AI systems requiring combined text and speech processing.
MiniMax M2 is a powerful MoE model with 200B total / 10B active parameters. Optimized for reasoning and coding tasks with excellent performance in multilingual scenarios. Features 128K context window and efficient inference.
MiniMax M2.1 is a state-of-the-art MoE model with 230B total / 10B active parameters, optimized for agentic coding and complex multi-step workflows. Excels at multilingual programming, tool use, and long-horizon planning. Matches Claude Sonnet 4.5 on code benchmarks and exceeds it in multilingual scenarios. Features 196K context window with FP8 efficiency. Released under Modified-MIT license for commercial use.
MiniMax M2.7 is a MiniMaxM2 architecture model with an undisclosed parameter count, optimized for advanced reasoning and agentic workflows. It excels at complex tool use, self‑evolution, and professional software engineering tasks, achieving a 66.6% medal rate on MLE Bench Lite and a 56.2% score on SWE Bench Pro. The model also attains an ELO of 1495 on GDPval‑AA, surpassing other open‑weight models. Available under an Other license.
by NVIDIA
NVIDIA Nemotron 3 Super 120B A12B FP8 is a 120B parameter (12B active) LatentMixture-of-Experts hybrid model with Mamba-2, MoE and Multi-Token Prediction layers, supporting up to 1M tokens context. It achieves 94.73% on HMMT Feb25 (with tools) and 83.73% on MMLU‑Pro, and scores 73.88% on Arena‑Hard‑V2 (Hard Prompt). The model supports configurable reasoning via an enable_thinking flag, tool use, and structured output. It is available under the NVIDIA Nemotron Open Model License.
NVIDIA Nemotron 3 Nano is a highly efficient hybrid Mamba-Transformer MoE model with 30B total / 3.5B active parameters. Features 128K context window extensible to 1M tokens. Excels at agentic AI, reasoning, and tool calling tasks. Trained on 25T tokens with state-of-the-art efficiency. Supports English, German, French, Spanish, Italian, and Japanese. Open weights with commercial license.
Nemotron Nano 12B V2 is a unified reasoning and chat model with controllable inference via /think and /no_think directives. Features hybrid Mamba-2 + MLP layers + 6 Attention layers architecture with 128K context. Achieves 76.25% AIME25, 97.75% MATH500, 70.79% LiveCodeBench, 66.98% BFCL v3. Supports runtime thinking budget control for accuracy-latency tradeoffs. Pre-trained on ~20T tokens with cutoff September 2024. Optimized for NVIDIA GPUs (A10G, H100, Jetson AGX Thor) with efficient Mamba-2 SSM for long-context handling. Includes native function calling and tool integration.
Moonshot Kimi K2 Instruct is a 1 trillion parameter Mixture-of-Experts model with 32B activated parameters, featuring 384 experts and 128K context length. Pre-trained on 15.5T tokens with Muon optimizer at unprecedented scale achieving zero instability. Achieves SOTA on LiveCodeBench (53.7%), SWE-bench Verified (71.6%), AIME 2024 (69.6%), and MATH-500 (97.4%). Specifically designed for agentic intelligence with exceptional tool calling, code generation, and mathematical reasoning capabilities.
Moonshot Kimi K2 Thinking is the reasoning-enhanced variant of the 1 trillion parameter MoE model with 32B activated parameters. Built on the same architecture as K2 Instruct with explicit thinking mode for complex problem-solving. Features 384 experts, 128K context, and trained on 15.5T tokens with zero-instability Muon optimization. Excels at deep reasoning tasks requiring multi-step deliberation including advanced mathematics, complex coding challenges, and agentic problem-solving with tool integration.
by BAAI
BAAI BGE-M3 is a versatile multilingual embedding model supporting dense, sparse, and multi-vector retrieval in a unified architecture. Handles 100+ languages with strong cross-lingual capabilities and flexible retrieval modes for different use cases. Features hybrid retrieval combining dense embeddings for semantic similarity, sparse representations for lexical matching, and multi-vector approaches for fine-grained relevance. Ideal for multilingual search engines, hybrid retrieval systems, and complex information retrieval scenarios requiring multiple matching strategies.
BAAI BGE Multilingual Gemma2 is a multilingual dense retrieval embedding model built on Gemma 2 architecture, supporting 100+ languages for cross-lingual semantic search and retrieval. Delivers strong performance across diverse language families including English, Chinese, Spanish, Arabic, Hindi, and many more. Ideal for multilingual search systems, cross-lingual document retrieval, international content recommendation, and global knowledge bases. Trained on large-scale multilingual data with balanced language representation.