AI Model Intelligence

Independent benchmarks and enterprise AI analysis across 842+ models from 108 providers — helping technology leaders, software engineers, and data science teams select the right AI platform for their business.

842

Total Models

514

Language Models

328

Media Models

23.3

Avg Intelligence

108

Providers

Intelligence Index

Composite intelligence score across all major benchmarks. Toggle between all models, open-weights only, or proprietary only.

Intelligence vs Cost

Plots Intelligence Index against price per 1M tokens. Upper-left models offer the best value.

Intelligence vs Cost — Zoomed In

Same chart with the top 10% most expensive outliers removed.

Image & Video Leaderboard

ELO ratings from head-to-head comparisons of image and video generation models.

ModelELO95% CIAppearances
AHappyHorse-1.01,397-11/117,978
BDreamina Seedance 2.0 720p1,348-11/114,811
Xgrok-imagine-video1,327-10/106,629
PPixVerse V61,324-10/108,740
BGenFlare 2.01,312-9/97,866
SRiverflow 2.01,289-10/106,976
VVidu Q3 Pro1,288-9/96,335
SSkyReels V41,287-10/106,161
KKling 2.5 Turbo 1080p1,285-10/104,439
KKling 3.0 1080p (Pro)1,283-10/105,710
KKling 3.0 Omni 1080p (Pro)1,281-10/105,147
PPixVerse V5.61,280-10/105,261
Google logoVeo 3.1 Fast Preview1,277-10/104,922
Google logoVeo 3.1 Preview1,275-10/105,118
MHailuo 02 06161,274-10/102,868
Google logoVeo 3.1 Fast1,271-10/105,183
PPixVerse V5.51,270-10/104,756
KKling 2.6 Standard (January)1,270-10/105,140
KKling 3.0 720p (Standard)1,265-10/105,614
Runway logoRunway Gen-4.51,264-10/104,426

Frontier Intelligence Over Time

Tracks the highest-scoring model each month by Intelligence Index.

Intelligence Evaluations

The hardest subset of GPQA, filtered for questions where experts agree and non-experts struggle.

No data for GPQA Diamond

Output Speed

View All

Tokens generated per second. Higher throughput means faster responses.

Pricing: Input vs Output

View All

Compares input and output token pricing (per 1M tokens) for the most affordable models.

Cost Efficiency

Intelligence Index plotted against cost per 1M tokens.

Last Updated: May 3, 2026

Independent AI Model Benchmarks for Enterprise, Education, and Research

Cognion provides independent, data-driven rankings of AI models across intelligence, coding, math, performance, pricing, and multimodal generation. Our benchmarks aggregate scores from industry-standard evaluations including MMLU-Pro, GPQA Diamond, HLE, LiveCodeBench, SWE-bench, MATH-500, and AIME — giving technology leaders, software architects, and data science teams a single source of truth for AI platform selection. Explore our intelligence rankings for general reasoning, coding benchmarks for software engineering, and math evaluations for quantitative applications.

Enterprise teams in financial services, healthcare, legal technology, consulting, and e-commerce use our rankings to compare models from OpenAI, Anthropic, Google, Meta, Mistral, and dozens of other providers. Whether you are evaluating GPT-4o for customer support automation, Claude for contract analysis, Gemini for multimodal data processing, or Llama for on-premises deployment, our model directory and pricing comparisons help you make informed decisions within budget.

In education, AI benchmarks inform curriculum design for computer science programs, data science bootcamps, and K-12 STEM initiatives. Researchers and graduate students use our trend analysis to track the pace of AI capability growth, while creative professionals explore our text-to-image, text-to-video, and text-to-speech rankings for media production workflows. For latency-sensitive applications, our performance benchmarks cover throughput, time-to-first-token, and output speed across providers.

Read our methodology to understand how we collect, normalize, and score benchmark data. For programmatic access, our API documentation provides endpoints for model data, leaderboards, and historical trends. Have questions? Visit our FAQ or contact us for enterprise evaluation services.