Discover AI models for every task
Showing 1-11 of 11 models
by Moonshot
Moonshot Kimi K2 Thinking is the reasoning-enhanced variant of the 1 trillion parameter MoE model with 32B activated parameters. Built on the same architecture as K2 Instruct with explicit thinking mode for complex problem-solving. Features 384 experts, 128K context, and trained on 15.5T tokens with zero-instability Muon optimization. Excels at deep reasoning tasks requiring multi-step deliberation including advanced mathematics, complex coding challenges, and agentic problem-solving with tool integration.
by Qwen
Qwen's "thinking-optimized" 80B model designed for sustained multi-step reasoning, structured deliberation, and high-precision problem-solving across math, code, and complex planning tasks.
Qwen3 30B A3B Thinking is the reasoning-focused MoE variant with 30B total / 3B activated parameters. Features explicit thinking mode for complex problem-solving with 262K native context extending to 1M tokens. Optimized for mathematical reasoning, logical inference, and multi-step problem decomposition while maintaining computational efficiency. Provides strong reasoning capabilities at a fraction of the compute cost of larger thinking models, ideal for resource-conscious deployments requiring deep reasoning.
Qwen3 235B A22B Thinking is the reasoning-enhanced MoE variant with 235B total / 22B activated parameters and 128 experts. Features explicit thinking mode for complex problem-solving with native 262K context extending to 1M tokens. Excels at deep reasoning tasks requiring multi-step deliberation including advanced mathematics, logical inference, and complex coding challenges. Built on same architecture as Instruct version but optimized for reasoning-heavy workloads with tool integration and agentic capabilities.
Moonshot Kimi K2.6 is a 1 trillion parameter Mixture-of-Experts chat model with 32 B activated parameters and a 256K context window. It combines a vision encoder (MoonViT) and supports image, video, and text inputs, enabling multimodal agentic interactions with tool calling and structured JSON output. The model achieves strong benchmark scores such as 91.7% on LiveCodeBench, 44.4% on HLE‑Full (reasoning) and 83.0% on MMMU‑Pro (vision) while offering a thinking mode for deep reasoning and an instant mode for fast responses. Available under an "Other" license.
Moonshot Kimi K2 Instruct is a 1 trillion parameter Mixture-of-Experts model with 32B activated parameters, featuring 384 experts and 128K context length. Pre-trained on 15.5T tokens with Muon optimizer at unprecedented scale achieving zero instability. Achieves SOTA on LiveCodeBench (53.7%), SWE-bench Verified (71.6%), AIME 2024 (69.6%), and MATH-500 (97.4%). Specifically designed for agentic intelligence with exceptional tool calling, code generation, and mathematical reasoning capabilities.
Moonshot AI's most powerful native multimodal agentic model. Features 1T parameters (32B activated), 256K context, vision capabilities, and advanced reasoning with agent swarm support.
Qwen3 Embedding 8B is a dense retrieval embedding model with 8 billion parameters, optimized for semantic search, text similarity, and feature extraction. Trained on diverse multilingual data providing strong cross-lingual retrieval capabilities. Supports 262K context for embedding long documents and extensive text passages. Excels at document retrieval, semantic search, clustering, and recommendation systems. Compatible with standard embedding frameworks and optimized for production deployment with efficient inference.
Qwen 3.5 9B is a 9B‑parameter multimodal large language model with a gated‑delta mixture‑of‑experts architecture and a vision encoder. It supports a native context window of 262,144 tokens and operates in a default thinking mode that can be disabled. The model achieves strong results such as 82.5% on MMLU‑Pro, 88.2% on C‑Eval, and 78.4% on MMMU benchmarks. It is released under the Apache 2.0 license.
Qwen 3.6 27B is a 27B‑parameter causal language model with a vision encoder, built on the Qwen3.5 gated‑delta architecture. It supports a native context window of 262,144 tokens (extendable to over 1 million) and operates in a default thinking mode that can be disabled. The model achieves strong benchmark results such as 86.2% on MMLU‑Pro, 82.9% on MMMU, and 77.2% on SWE‑bench Verified, demonstrating strong coding and multimodal reasoning capabilities. It is released under the Apache 2.0 license.
by Deepseek
DeepSeek R1 0528 is an upgraded 685B parameter reasoning model with significantly enhanced depth of reasoning and inference capabilities. Achieves 87.5% on AIME 2025 (up from 70%), 91.4% on AIME 2024, 73.3% on LiveCodeBench, and 1930 Codeforces rating. Features system prompt support, averages 23K thinking tokens per question for deeper analysis, and reduced hallucination rate. Released under MIT license supporting commercial use and distillation. Performance approaching O3 and Gemini 2.5 Pro levels.