All Leaderboards
Language Model Leaderboard
Compare 50 leading LLMs across performance, cost, and capabilities
| Rank | Provider | Capabilities | |||||
|---|---|---|---|---|---|---|---|
| 1 | GPT-5.2 gpt-5.2 | OpenAI | 92.8% | 95.1% | 200K | $20.00 / $80.00 | Advanced Reasoning Vision Code Generation +2 |
| 2 | OpenAI o1-preview o1-preview | OpenAI | 92.3% | - | 128K | $15.00 / $60.00 | Advanced Reasoning Chain of Thought Science +1 |
| 3 | Gemini 3 Pro gemini-3-pro | Google | 91.5% | 93.8% | 1000K | $15.00 / $60.00 | Deep Think Vision Audio +2 |
| 4 | GPT-5.1 gpt-5.1 | OpenAI | 91.2% | 94.3% | 200K | $18.00 / $72.00 | Adaptive Reasoning Vision Code Generation +1 |
| 5 | GPT-5 gpt-5 | OpenAI | 90.4% | 93.5% | 200K | $15.00 / $60.00 | Advanced Reasoning Vision Code Generation +1 |
| 6 | Claude 3.5 Sonnet claude-3-5-sonnet-20241022 | Anthropic | 90.4% | 92.0% | 200K | $3.00 / $15.00 | Long Context Vision Code Generation +2 |
| 7 | Gemini 2.5 Pro gemini-2.5-pro | Google | 89.8% | 91.5% | 1000K | $10.00 / $40.00 | Deep Think Vision Audio +2 |
| 8 | Grok 4.1 grok-4.1 | xAI | 88.9% | 90.7% | 131K | $5.00 / $15.00 | Real-time Data Vision Less Filtered +1 |
| 9 | Claude Opus 4.5 claude-opus-4-5 | Anthropic | 88.7% | 92.0% | 200K | $15.00 / $75.00 | Long Context Vision Code Generation +3 |
| 10 | Llama 3.1 405B llama-3.1-405b-instruct | Meta | 88.6% | 89.0% | 128K | $2.70 / $2.70 | Open Source Long Context Multilingual +1 |
| 11 | Claude Sonnet 4.5 claude-sonnet-4-5 | Anthropic | 88.3% | 93.7% | 1000K | $3.00 / $15.00 | Long Context Vision Code Generation +2 |
| 12 | Qwen 3 72B qwen-3-72b-instruct | Alibaba | 87.8% | 89.2% | 131K | $0.90 / $0.90 | Open Source Multilingual Code +2 |
| 13 | Grok 2 grok-2-1212 | xAI | 87.5% | 88.4% | 131K | $2.00 / $10.00 | Real-time Data Vision Less Filtered |
| 14 | GPT-4o gpt-4o-2024-11-20 | OpenAI | 87.2% | 90.2% | 128K | $2.50 / $10.00 | Vision Audio Fast +1 |
| 15 | Claude 3 Opus claude-3-opus-20240229 | Anthropic | 86.8% | 84.9% | 200K | $15.00 / $75.00 | Long Context Vision Analysis +1 |
| 16 | GPT-4 Turbo gpt-4-turbo-2024-04-09 | OpenAI | 86.4% | 87.2% | 128K | $10.00 / $30.00 | Vision JSON Mode Function Calling +1 |
| 17 | Llama 3.1 70B llama-3.1-70b-instruct | Meta | 86.0% | 80.5% | 128K | $0.88 / $0.88 | Open Source Long Context Efficient +1 |
| 18 | Llama 3.3 70B llama-3.3-70b-instruct | Meta | 86.0% | 88.4% | 128K | $0.88 / $0.88 | Open Source Long Context Multilingual +1 |
| 19 | Gemini 1.5 Pro gemini-1.5-pro | Google | 85.9% | 84.1% | 2000K | $1.25 / $5.00 | Extreme Long Context Vision Audio +1 |
| 20 | DeepSeek V3.2 deepseek-v3.2 | DeepSeek | 85.7% | 95.8% | 128K | $0.60 / $2.40 | Reasoning Code Math +2 |
| 21 | Gemini 3 Flash gemini-3-flash | Google | 85.4% | 87.2% | 1000K | $0.15 / $0.60 | Ultra Fast Long Context Multimodal +1 |
| 22 | Qwen 2.5 72B qwen-2.5-72b-instruct | Alibaba | 85.3% | 86.0% | 131K | $0.90 / $0.90 | Open Source Multilingual Code +1 |
| 23 | OpenAI o1-mini o1-mini | OpenAI | 85.2% | 94.6% | 128K | $3.00 / $12.00 | Reasoning Code Math +1 |
| 24 | Qwen2.5-Coder-32B qwen-2.5-coder-32b-instruct | Alibaba | 85.0% | 92.0% | 131K | $0.90 / $0.90 | Code Generation Open Source Code Completion +1 |
| 25 | Phi-4 phi-4 | Microsoft | 84.8% | 82.6% | 16K | $0.10 / $0.10 | Small Model Math Reasoning +1 |
| 26 | Grok 4 Fast grok-4-fast | xAI | 84.2% | 86.4% | 2000K | $2.00 / $8.00 | Extreme Long Context Real-time Data Fast +1 |
| 27 | Qwen2 72B qwen-2-72b-instruct | Alibaba | 84.2% | 86.0% | 131K | $0.90 / $0.90 | Open Source Multilingual Code +2 |
| 28 | Mistral Large 2 mistral-large-2407 | Mistral | 84.0% | 92.0% | 128K | $3.00 / $9.00 | Function Calling JSON Mode Multilingual +1 |
| 29 | GPT-4o mini gpt-4o-mini-2024-07-18 | OpenAI | 82.0% | 87.2% | 128K | $0.15 / $0.60 | Multimodal Fast Affordable +1 |
| 30 | Nemotron 4 340B nemotron-4-340b-instruct | NVIDIA | 81.0% | 73.0% | 4K | $0.00 / $0.00 | Open Source RLHF Optimized Free |
| 31 | Jamba 1.5 Large jamba-1.5-large | AI21 Labs | 80.3% | 68.2% | 256K | $2.00 / $8.00 | Extremely Long Context Hybrid Architecture Multilingual |
| 32 | DeepSeek R1 deepseek-r1 | DeepSeek | 79.8% | 96.3% | 64K | $0.55 / $2.19 | Reasoning Code Math +1 |
| 33 | DeepSeek Coder V2 deepseek-coder-v2-236b | DeepSeek | 79.2% | 90.2% | 128K | $0.28 / $0.42 | Code Generation MoE Architecture 128K Context +1 |
| 34 | Reka Core reka-core-20240501 | Reka AI | 78.8% | 74.8% | 128K | $3.00 / $15.00 | Multimodal Vision Audio +1 |
| 35 | Amazon Nova Pro amazon-nova-pro-v1 | Amazon | 78.7% | 84.0% | 300K | $0.80 / $3.20 | Long Context Vision AWS Native +1 |
| 36 | Claude 4.5 Haiku claude-haiku-4-5 | Anthropic | 78.5% | 82.3% | 200K | $0.80 / $4.00 | Fast Long Context Vision +1 |
| 37 | Inflection 2.5 (Pi) inflection-2.5 | Inflection | 78.0% | - | 33K | $0.00 / $0.00 | Conversational Empathetic Free |
| 38 | Mixtral 8x22B mixtral-8x22b-instruct | Mistral | 77.7% | 75.0% | 64K | $2.00 / $6.00 | Open Source MoE Architecture Multilingual +1 |
| 39 | Yi Large yi-large | 01.AI | 76.3% | 77.9% | 33K | $3.00 / $3.00 | Bilingual Code Long Context |
| 40 | Mistral Medium mistral-medium-2312 | Mistral | 75.3% | 76.0% | 32K | $2.70 / $8.10 | Function Calling JSON Mode Multilingual |
| 41 | Claude 4 Haiku claude-haiku-4-20250514 | Anthropic | 75.2% | 75.9% | 200K | $0.80 / $4.00 | Fast Long Context Vision +1 |
| 42 | Command R+ command-r-plus | Cohere | 75.0% | 70.0% | 128K | $3.00 / $15.00 | RAG Optimized Tool Use Multilingual +1 |
| 43 | Palmyra X 004 palmyra-x-004 | Writer | 75.0% | - | 128K | $2.50 / $10.00 | Enterprise Graph RAG Knowledge Graphs |
| 44 | DBRX Instruct dbrx-instruct | Databricks | 73.7% | 70.8% | 33K | $0.75 / $2.25 | Open Source MoE Architecture Enterprise |
| 45 | Llama 3.1 8B llama-3.1-8b-instruct | Meta | 73.0% | 72.6% | 128K | $0.05 / $0.08 | Open Source Efficient Long Context +1 |
| 46 | Sonar Large Online sonar-large-32k-online | Perplexity | 72.0% | - | 33K | $1.00 / $1.00 | Real-time Search Citations Web Access |
| 47 | Gemini 2.0 Flash gemini-flash-2.0 | Google | 71.9% | 74.4% | 1049K | $0.10 / $0.40 | Ultra Fast Long Context Multimodal +1 |
| 48 | Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp | Google | 71.9% | 74.4% | 1000K | $0.10 / $0.40 | Reasoning Thinking Mode Long Context +2 |
| 49 | GPT-3.5 Turbo gpt-3.5-turbo-0125 | OpenAI | 70.0% | 76.8% | 16K | $0.50 / $1.50 | Fast Affordable Function Calling |
| 50 | Codestral codestral-22b | Mistral | 70.0% | 81.1% | 256K | $1.00 / $3.00 | Code Generation 80+ Languages Largest Context for Coding +1 |
Showing 50 of 50 models