AI Model Intelligence
Independent benchmarks and enterprise AI analysis across 787+ models from 103 providers — helping technology leaders, software engineers, and data science teams select the right AI platform for their business.
Total Models
Language Models
Media Models
Avg Intelligence
Providers
Intelligence Index
Composite intelligence score across all major benchmarks. Toggle between all models, open-weights only, or proprietary only.
Intelligence vs Cost
Plots Intelligence Index against price per 1M tokens. Upper-left models offer the best value.
Intelligence vs Cost — Zoomed In
Same chart with the top 10% most expensive outliers removed.
Image & Video Leaderboard
ELO ratings from head-to-head comparisons of image and video generation models.
| Model | ELO | 95% CI | Appearances |
|---|---|---|---|
| HHappyHorse-1.0 | 1,409 | -8/8 | 7,796 |
| BDreamina Seedance 2.0 720p | 1,357 | -10/10 | 4,633 |
| Xgrok-imagine-video | 1,332 | -9/9 | 6,267 |
| BGenFlare 2.0 | 1,325 | -8/8 | 7,634 |
| PPixVerse V6 | 1,321 | -7/7 | 8,154 |
| KKling 3.0 Omni 1080p (Pro) | 1,299 | -10/10 | 4,712 |
| SSkyReels V4 | 1,297 | -9/9 | 5,338 |
| KKling 2.5 Turbo 1080p | 1,296 | -11/11 | 4,094 |
| PPixVerse V5.6 | 1,291 | -10/10 | 4,853 |
| 1,289 | -10/10 | 4,841 | |
| VVidu Q3 Pro | 1,288 | -9/9 | 5,895 |
| 1,288 | -10/10 | 4,726 | |
| 1,286 | -10/10 | 4,561 | |
| SRiverflow 2.0 | 1,283 | -9/9 | 6,663 |
| MHailuo 02 0616 | 1,282 | -13/13 | 2,600 |
| KKling 2.6 Standard (January) | 1,281 | -10/10 | 4,725 |
| KKling 3.0 1080p (Pro) | 1,277 | -9/9 | 5,382 |
| PPixVerse V5.5 | 1,275 | -10/10 | 4,338 |
| BGenFlare | 1,273 | -11/11 | 3,663 |
| PPixVerse V5 | 1,271 | -10/10 | 5,014 |
Frontier Intelligence Over Time
Tracks the highest-scoring model each month by Intelligence Index.
Intelligence Evaluations
The hardest subset of GPQA, filtered for questions where experts agree and non-experts struggle.
Pricing: Input vs Output
View AllCompares input and output token pricing (per 1M tokens) for the most affordable models.
Cost Efficiency
Intelligence Index plotted against cost per 1M tokens.
Key Findings
Category Overview
Latest Updates
Text-to-Image
Compare AI image generation models used in marketing, e-commerce, and creative design workflows.
Performance
Analyze speed, latency, and throughput metrics critical for real-time enterprise applications.
Coding
Compare models on code generation, debugging, and software engineering tasks.
Independent AI Model Benchmarks for Enterprise, Education, and Research
Cognion provides independent, data-driven rankings of AI models across intelligence, coding, math, performance, pricing, and multimodal generation. Our benchmarks aggregate scores from industry-standard evaluations including MMLU-Pro, GPQA Diamond, HLE, LiveCodeBench, SWE-bench, MATH-500, and AIME — giving technology leaders, software architects, and data science teams a single source of truth for AI platform selection. Explore our intelligence rankings for general reasoning, coding benchmarks for software engineering, and math evaluations for quantitative applications.
Enterprise teams in financial services, healthcare, legal technology, consulting, and e-commerce use our rankings to compare models from OpenAI, Anthropic, Google, Meta, Mistral, and dozens of other providers. Whether you are evaluating GPT-4o for customer support automation, Claude for contract analysis, Gemini for multimodal data processing, or Llama for on-premises deployment, our model directory and pricing comparisons help you make informed decisions within budget.
In education, AI benchmarks inform curriculum design for computer science programs, data science bootcamps, and K-12 STEM initiatives. Researchers and graduate students use our trend analysis to track the pace of AI capability growth, while creative professionals explore our text-to-image, text-to-video, and text-to-speech rankings for media production workflows. For latency-sensitive applications, our performance benchmarks cover throughput, time-to-first-token, and output speed across providers.
Read our methodology to understand how we collect, normalize, and score benchmark data. For programmatic access, our API documentation provides endpoints for model data, leaderboards, and historical trends. Have questions? Visit our FAQ or contact us for enterprise evaluation services.