Discover AI models for every task
Showing 1-21 of 21 models
by Sentence Transformers
Sentence Transformers Paraphrase Multilingual MPNet is a multilingual sentence embedding model based on MPNet architecture, supporting 50+ languages for cross-lingual semantic similarity and paraphrase detection. Trained on large-scale paraphrase datasets across multiple languages enabling strong cross-lingual transfer. Ideal for multilingual paraphrase detection, semantic textual similarity, cross-lingual search, and international content deduplication. Provides balanced performance across diverse language families with proven track record in sentence-transformers ecosystem.
by BAAI
BAAI BGE Multilingual Gemma2 is a multilingual dense retrieval embedding model built on Gemma 2 architecture, supporting 100+ languages for cross-lingual semantic search and retrieval. Delivers strong performance across diverse language families including English, Chinese, Spanish, Arabic, Hindi, and many more. Ideal for multilingual search systems, cross-lingual document retrieval, international content recommendation, and global knowledge bases. Trained on large-scale multilingual data with balanced language representation.
by intfloat
intfloat Multilingual E5 Large Instruct is an instruction-tuned multilingual embedding model combining strong cross-lingual capabilities with instruction-following for guided retrieval. Supports 100+ languages with natural language instructions to customize embedding behavior. Features enhanced zero-shot retrieval performance through instruction-based query understanding. Ideal for complex multilingual search scenarios, domain-specific retrieval tasks, and applications requiring adaptive semantic understanding across languages.
intfloat Multilingual E5 Large is a powerful multilingual dense retrieval embedding model supporting 100+ languages with strong cross-lingual capabilities. Features 1024-dimensional embeddings optimized for semantic search, document retrieval, and text similarity across diverse language families. Pre-trained on large-scale multilingual data with contrastive learning for robust cross-lingual transfer. Ideal for international search systems, multilingual document retrieval, and global content recommendation platforms requiring high-quality semantic understanding.
by MiniMax
MiniMax M2 is a powerful MoE model with 200B total / 10B active parameters. Optimized for reasoning and coding tasks with excellent performance in multilingual scenarios. Features 128K context window and efficient inference.
MiniMax M2.5 is a state-of-the-art reasoning MoE model with 229B total / 10B active parameters. Extensively trained with reinforcement learning across 200,000+ real-world environments, achieving SOTA performance in coding (80.2% SWE-Bench Verified), agentic tool use, search, and office productivity tasks. Features 197K context window, efficient MoE inference, and strong multilingual support.
MiniMax M2.1 is a state-of-the-art MoE model with 230B total / 10B active parameters, optimized for agentic coding and complex multi-step workflows. Excels at multilingual programming, tool use, and long-horizon planning. Matches Claude Sonnet 4.5 on code benchmarks and exceeds it in multilingual scenarios. Features 196K context window with FP8 efficiency. Released under Modified-MIT license for commercial use.
by ZAI
GLM-4.7 is Z.ai's latest large language model with enhanced reasoning capabilities. Excels at mathematical problem solving, coding, and complex logical tasks. Features improved context understanding and multilingual support.
by Mistral
Mistral Small 4 is a 119B-parameter Mixture-of-Experts model (128 experts, 4 active per token, 6.5B active parameters) that unifies instruct, reasoning, and coding capabilities into a single multimodal model. It accepts text and image inputs, supports function calling, structured outputs, and configurable reasoning effort (none for fast responses, high for deep step-by-step reasoning). With a 256K context window and Apache 2.0 license, it delivers 40% lower latency and 3x higher throughput compared to Mistral Small 3.
BAAI BGE-M3 is a versatile multilingual embedding model supporting dense, sparse, and multi-vector retrieval in a unified architecture. Handles 100+ languages with strong cross-lingual capabilities and flexible retrieval modes for different use cases. Features hybrid retrieval combining dense embeddings for semantic similarity, sparse representations for lexical matching, and multi-vector approaches for fine-grained relevance. Ideal for multilingual search engines, hybrid retrieval systems, and complex information retrieval scenarios requiring multiple matching strategies.
by Qwen
Qwen3 Embedding 8B is a dense retrieval embedding model with 8 billion parameters, optimized for semantic search, text similarity, and feature extraction. Trained on diverse multilingual data providing strong cross-lingual retrieval capabilities. Supports 262K context for embedding long documents and extensive text passages. Excels at document retrieval, semantic search, clustering, and recommendation systems. Compatible with standard embedding frameworks and optimized for production deployment with efficient inference.
Qwen2.5 72B Instruct is Alibaba's instruction-tuned large language model with 72B parameters. Excels at following complex instructions, coding, mathematical reasoning, and multilingual tasks. Features 128K context window.
Qwen3 32B is a base foundation model with 32 billion parameters and 262K native context, designed for fine-tuning and custom adaptations. Pre-trained on diverse multilingual data covering 77.5% of languages, providing strong general capabilities across text understanding, code, mathematics, and reasoning. Serves as the foundation for specialized models and custom fine-tuning projects requiring a powerful mid-sized base. Ideal starting point for domain-specific adaptations and research applications.
by Google
Gemma 3 27B IT is a cutting-edge multimodal vision-language model with 27 billion parameters, built on Gemini technology. Trained on 14 trillion tokens, it handles both text and image inputs with a 128K context window and supports 140+ languages. Excels at visual understanding, code generation, mathematical reasoning, and multilingual tasks. Achieves 78.6 on MMLU, 82.6 on GSM8K, 85.6 on DocVQA, and 76.3 on ChartQA. Lightweight enough for laptop deployment with strong safety improvements over previous Gemma versions.
by Meta
Meta Llama 3.3 70B Instruct is a multilingual instruction-tuned model optimized for dialogue. Trained on ~15 trillion tokens with cutoff December 2023, it outperforms many open-source and closed models. Major improvements include 92.1% on IFEval (steerability), 88.4% on HumanEval (code), 77.0% on MATH, and 91.1% on MGSM (multilingual). Features 128K context, Grouped-Query Attention, and supports 8 languages including English, German, French, Spanish, Italian, Portuguese, Hindi, and Thai. Trained on 7M GPU hours with 100% renewable energy.
Meta Llama 3.1 8B Instruct is an efficient multilingual instruction-tuned model optimized for dialogue and assistant use cases. With 8 billion parameters and 128K context length, it provides strong performance across general tasks, code generation, and multilingual understanding. Supports function calling and tool use with Grouped-Query Attention architecture. Ideal for deployment scenarios requiring lower compute resources while maintaining quality across English and 7 additional languages including German, French, Spanish, and Hindi.
Qwen3 30B A3B Instruct is a compact Mixture-of-Experts model with 30B total parameters and 3B activated per token, offering excellent efficiency for general-purpose tasks. Features 262K native context with extension to 1M tokens, strong multilingual capabilities, and enhanced instruction following. Balances performance and computational efficiency with support for tool calling, code generation, and logical reasoning. Ideal for deployment scenarios requiring lower resource usage while maintaining quality across diverse task types.
by OpenAI
OpenAI Whisper Large V3 is a state-of-the-art automatic speech recognition model with 1550M parameters supporting 99 languages. Achieves 10-20% WER reduction compared to V2, trained on 1M hours weakly labeled + 4M hours pseudo-labeled audio. Features 128 Mel frequency bins (increased from 80), improved robustness to accents and background noise, and new Cantonese language support. Supports speech transcription and speech-to-English translation with sentence and word-level timestamps. Optimized with torch.compile for 4.5x speedup. Ideal for accessibility tools, multilingual transcription, and enterprise ASR applications.
High-end multimodal model delivering strong vision-language reasoning with long-context support.
Qwen3 235B A22B Instruct is a Mixture-of-Experts model with 235B total parameters and 22B activated, featuring 128 experts with 8 activated per token. Native 262K context extended to 1M tokens via Dual Chunk Attention. Achieves SOTA: 83.0 MMLU-Pro, 70.3 AIME25, 41.8 ARC-AGI, 79.2 Arena-Hard v2, 51.8 LiveCodeBench, 70.9 BFCL-v3. Non-thinking mode focused on direct task execution with enhanced instruction following, logical reasoning, and long-tail knowledge across multiple languages. Dramatically more efficient than full 235B models.
Qwen 3.5 9B is a 9B‑parameter multimodal large language model with a gated‑delta mixture‑of‑experts architecture and a vision encoder. It supports a native context window of 262,144 tokens and operates in a default thinking mode that can be disabled. The model achieves strong results such as 82.5% on MMLU‑Pro, 88.2% on C‑Eval, and 78.4% on MMMU benchmarks. It is released under the Apache 2.0 license.