Discover AI models for every task
Showing 1-10 of 10 models
by NVIDIA
NVIDIA Nemotron 3 Nano is a highly efficient hybrid Mamba-Transformer MoE model with 30B total / 3.5B active parameters. Features 128K context window extensible to 1M tokens. Excels at agentic AI, reasoning, and tool calling tasks. Trained on 25T tokens with state-of-the-art efficiency. Supports English, German, French, Spanish, Italian, and Japanese. Open weights with commercial license.
Nemotron Nano 12B V2 is a unified reasoning and chat model with controllable inference via /think and /no_think directives. Features hybrid Mamba-2 + MLP layers + 6 Attention layers architecture with 128K context. Achieves 76.25% AIME25, 97.75% MATH500, 70.79% LiveCodeBench, 66.98% BFCL v3. Supports runtime thinking budget control for accuracy-latency tradeoffs. Pre-trained on ~20T tokens with cutoff September 2024. Optimized for NVIDIA GPUs (A10G, H100, Jetson AGX Thor) with efficient Mamba-2 SSM for long-context handling. Includes native function calling and tool integration.
NVIDIA Nemotron 3 Super 120B A12B FP8 is a 120B parameter (12B active) LatentMixture-of-Experts hybrid model with Mamba-2, MoE and Multi-Token Prediction layers, supporting up to 1M tokens context. It achieves 94.73% on HMMT Feb25 (with tools) and 83.73% on MMLU‑Pro, and scores 73.88% on Arena‑Hard‑V2 (Hard Prompt). The model supports configurable reasoning via an enable_thinking flag, tool use, and structured output. It is available under the NVIDIA Nemotron Open Model License.
by Qwen
Qwen3 30B A3B Instruct is a compact Mixture-of-Experts model with 30B total parameters and 3B activated per token, offering excellent efficiency for general-purpose tasks. Features 262K native context with extension to 1M tokens, strong multilingual capabilities, and enhanced instruction following. Balances performance and computational efficiency with support for tool calling, code generation, and logical reasoning. Ideal for deployment scenarios requiring lower resource usage while maintaining quality across diverse task types.
Qwen3 30B A3B Thinking is the reasoning-focused MoE variant with 30B total / 3B activated parameters. Features explicit thinking mode for complex problem-solving with 262K native context extending to 1M tokens. Optimized for mathematical reasoning, logical inference, and multi-step problem decomposition while maintaining computational efficiency. Provides strong reasoning capabilities at a fraction of the compute cost of larger thinking models, ideal for resource-conscious deployments requiring deep reasoning.
Qwen3 Coder 30B A3B Instruct is an efficient Mixture-of-Experts coding model with 30B total parameters and 3B activated per token. Specialized for code generation, debugging, and software engineering with excellent computational efficiency. Features 262K native context for processing large codebases, strong multi-language programming support, and optimized for practical coding tasks. Balances coding performance with lower resource requirements, ideal for development environments and real-time code assistance.
by H Company
Holo2-30B-A3B is a 30 billion parameter Mixture-of-Experts vision-language model from H Company with 3 billion active parameters per token. Supports text and image inputs with 131K context window. Optimized for chat, vision understanding, and function calling tasks.
Qwen's "thinking-optimized" 80B model designed for sustained multi-step reasoning, structured deliberation, and high-precision problem-solving across math, code, and complex planning tasks.
Qwen3 Coder 480B A35B Instruct is a specialized Mixture-of-Experts coding model with 480B total parameters and 35B activated. Optimized specifically for code generation, code understanding, debugging, and software engineering tasks. Features 262K native context for handling large codebases, strong performance on coding benchmarks including LiveCodeBench and HumanEval, and support for multiple programming languages. Excels at complex algorithmic problems, code refactoring, and technical documentation generation.
by Meta
Meta Llama 3.3 70B Instruct is a multilingual instruction-tuned model optimized for dialogue. Trained on ~15 trillion tokens with cutoff December 2023, it outperforms many open-source and closed models. Major improvements include 92.1% on IFEval (steerability), 88.4% on HumanEval (code), 77.0% on MATH, and 91.1% on MGSM (multilingual). Features 128K context, Grouped-Query Attention, and supports 8 languages including English, German, French, Spanish, Italian, Portuguese, Hindi, and Thai. Trained on 7M GPU hours with 100% renewable energy.