Discover AI models for every task
Showing 1-16 of 16 models
by Moonshot
Moonshot AI's most powerful native multimodal agentic model. Features 1T parameters (32B activated), 256K context, vision capabilities, and advanced reasoning with agent swarm support.
Moonshot Kimi K2 Thinking is the reasoning-enhanced variant of the 1 trillion parameter MoE model with 32B activated parameters. Built on the same architecture as K2 Instruct with explicit thinking mode for complex problem-solving. Features 384 experts, 128K context, and trained on 15.5T tokens with zero-instability Muon optimization. Excels at deep reasoning tasks requiring multi-step deliberation including advanced mathematics, complex coding challenges, and agentic problem-solving with tool integration.
Moonshot Kimi K2 Instruct is a 1 trillion parameter Mixture-of-Experts model with 32B activated parameters, featuring 384 experts and 128K context length. Pre-trained on 15.5T tokens with Muon optimizer at unprecedented scale achieving zero instability. Achieves SOTA on LiveCodeBench (53.7%), SWE-bench Verified (71.6%), AIME 2024 (69.6%), and MATH-500 (97.4%). Specifically designed for agentic intelligence with exceptional tool calling, code generation, and mathematical reasoning capabilities.
by ZAI
ZAI GLM 5.1 is a 744B parameter Mixture-of-Experts language model built with the GLM‑MoE DSA architecture. It excels at agentic engineering, achieving state-of-the-art performance on benchmarks such as HLE with tools (52.3), SWE‑Bench Pro (58.4) and AIME 2026 (95.3). The model supports extensive tool use and long‑horizon reasoning, with a large context window of up to 128K tokens. It is released under the MIT license.
by Qwen
Qwen 3.5 9B is a 9B‑parameter multimodal large language model with a gated‑delta mixture‑of‑experts architecture and a vision encoder. It supports a native context window of 262,144 tokens and operates in a default thinking mode that can be disabled. The model achieves strong results such as 82.5% on MMLU‑Pro, 88.2% on C‑Eval, and 78.4% on MMMU benchmarks. It is released under the Apache 2.0 license.
Qwen 3.5 397B A17B is a 397B-parameter mixture-of-experts vision-language foundation model with a gated delta network architecture and a vision encoder. It supports a native context window of 262,144 tokens (extendable to over 1 million) and operates in a default thinking mode that can be disabled. The model achieves strong results such as 87.8% on MMLU‑Pro, 85.0% on MMMU, and 88.6% on MathVision benchmarks. It is released under the Apache 2.0 license.
by MiniMax
MiniMax M2.5 is a state-of-the-art reasoning MoE model with 229B total / 10B active parameters. Extensively trained with reinforcement learning across 200,000+ real-world environments, achieving SOTA performance in coding (80.2% SWE-Bench Verified), agentic tool use, search, and office productivity tasks. Features 197K context window, efficient MoE inference, and strong multilingual support.
ZAI's frontier 744B MoE model (40B activated) with 203K context. Excels at agentic engineering, coding (SWE-bench 77.8%), reasoning, and tool use. Built with asynchronous RL and MIT licensed.
Qwen2.5 72B Instruct is Alibaba's instruction-tuned large language model with 72B parameters. Excels at following complex instructions, coding, mathematical reasoning, and multilingual tasks. Features 128K context window.
by BAAI
BAAI BGE Large EN V1.5 is a state-of-the-art English dense retrieval embedding model with 1024-dimensional embeddings and 512 token sequence length. Achieves 64.23 average on MTEB leaderboard across 56 tasks with 54.29 on retrieval. Pre-trained with RetroMAE and fine-tuned on large-scale contrastive learning data. V1.5 improvements include better similarity distribution and flexible usage without query instructions. Ideal for semantic search, document retrieval, re-ranking pipelines, and sentence similarity tasks. Production-ready with 3.4M+ downloads/month.
GLM-4.5 Air is a compact 106B parameter Mixture-of-Experts model with 12B active parameters, optimized for efficiency while maintaining strong performance. Scores 59.8 across 12 industry benchmarks with superior resource efficiency compared to full GLM-4.5. Features hybrid reasoning mode with 128K context, supports intelligent agent functions and tool calling. Released under MIT license with commercial use allowed. Ideal for deployment scenarios requiring balance between capability and computational cost.
GLM-4.5 is a 355B parameter Mixture-of-Experts foundation model with 32B active parameters, designed for intelligent agents. Features hybrid reasoning mode with configurable thinking enabled by default. Ranks 3rd place at 63.2 across 12 industry benchmarks among all proprietary and open-source models. Released under MIT license with 128K context, supports reasoning, coding, and intelligent agent functions including OpenAI-style tool calling. Incorporates MTP (Multi-Token Prediction) layers with speculative decoding for efficient inference.
Qwen3.5-122B-A10B is Alibaba Cloud's native multimodal agent model with 122B total parameters (10B activated). Features 240K context, vision capabilities, hybrid reasoning with extended thinking, function calling, and support for 201 languages. Apache 2.0 licensed.
High-end multimodal model delivering strong vision-language reasoning with long-context support.
by Black Forest Labs
Black Forest Labs FLUX.2 [klein] 4B is a lightweight, fast image generation model optimized for speed and efficiency. With 4 billion parameters, it delivers quick image generation while maintaining good quality. Perfect for rapid prototyping, bulk generation, and applications requiring low latency. Supports both text-to-image and image-to-image generation with excellent cost-efficiency.
Black Forest Labs FLUX.2 [klein] 9B is a balanced image generation model offering excellent quality-to-speed ratio. With 9 billion parameters, it provides better detail and composition than the 4B variant while remaining faster than full-size models. Ideal for production workloads requiring a balance between quality, speed, and cost. Supports both text-to-image and image-to-image generation.