January 2026 LM Ranking

| Rank | Model | Score |
|---|---|---|
| 1 | gemini-3-pro | 1490 |
| 2 | grok-4.1-thinking | 1477 |
| 3 | gemini-3-flash | 1471 |
| 4 | claude-opus-4-5-thinking-32k | 1469 |
| 5 | grok-4.1 | 1466 |
| 6 | claude-opus-4-5 | 1465 |
| 7 | gemini-3-flash (thinking-minimal) | 1464 |
| 8 | gpt-5.1-high | 1457 |
| 9 | gemini-2.5-pro | 1450 |
| 10 | claude-sonnet-4-5-thinking-32k | 1450 |
| 11 | claude-opus-4-1-thinking-16k | 1448 |
| 12 | claude-sonnet-4-5 | 1448 |
| 13 | ernie-5.0 | 1446 |
| 14 | gpt-4.5 | 1443 |
| 15 | claude-opus-4-1 | 1443 |
| 16 | glm-4.7 | 1443 |
| 17 | chatgpt-4o-latest | 1442 |
| 18 | gpt-5.2 | 1440 |
| 19 | gpt-5.2-high | 1440 |
| 20 | gpt-5-high | 1436 |
| Rank | Model | Score | Organization |
|---|---|---|---|
| 1 | gemini-3-pro | 1490 | |
| 2 | grok-4.1-thinking | 1477 | xAI |
| 3 | gemini-3-flash | 1471 | |
| 4 | claude-opus-4-5-thinking-32k | 1469 | Anthropic |
| 5 | grok-4.1 | 1466 | xAI |
| 6 | claude-opus-4-5 | 1465 | Anthropic |
| 7 | gemini-3-flash (thinking-minimal) | 1464 | |
| 8 | gpt-5.1-high | 1457 | OpenAI |
| 9 | gemini-2.5-pro | 1450 | |
| 10 | claude-sonnet-4-5-thinking-32k | 1450 | Anthropic |
| 11 | claude-opus-4-1-thinking-16k | 1448 | Anthropic |
| 12 | claude-sonnet-4-5 | 1448 | Anthropic |
| 13 | ernie-5.0 | 1446 | Baidu |
| 14 | gpt-4.5 | 1443 | OpenAI |
| 15 | claude-opus-4-1 | 1443 | Anthropic |
| 16 | glm-4.7 | 1443 | Z.ai |
| 17 | chatgpt-4o-latest | 1442 | OpenAI |
| 18 | gpt-5.2 | 1440 | OpenAI |
| 19 | gpt-5.2-high | 1440 | OpenAI |
| 20 | gpt-5-high | 1436 | OpenAI |
The LMArena Ranking is a crowdsourced leaderboard for large language models. Users chat with two anonymous models and vote for the better response, with model ratings calculated using the Elo rating system. The leaderboard covers multiple capability dimensions including text, vision, and code, making it one of the most authoritative LLM evaluation benchmarks. Based on this ranking, we have done model name aggregation and cleaning work.