AI Model Intelligence

Independent benchmarks and enterprise AI analysis across 842+ models from 108 providers — helping technology leaders, software engineers, and data science teams select the right AI platform for their business.

Explore Rankings View Pricing

842

Total Models

514

Language Models

328

Media Models

23.3

Avg Intelligence

108

Providers

Intelligence Index

Composite intelligence score across all major benchmarks. Toggle between all models, open-weights only, or proprietary only.

Intelligence vs Cost

Plots Intelligence Index against price per 1M tokens. Upper-left models offer the best value.

Intelligence vs Cost — Zoomed In

Same chart with the top 10% most expensive outliers removed.

Image & Video Leaderboard

ELO ratings from head-to-head comparisons of image and video generation models.

Model	ELO	95% CI	Appearances
AHappyHorse-1.0	1,397	-11/11	7,978
BDreamina Seedance 2.0 720p	1,348	-11/11	4,811
Xgrok-imagine-video	1,327	-10/10	6,629
PPixVerse V6	1,324	-10/10	8,740
BGenFlare 2.0	1,312	-9/9	7,866
SRiverflow 2.0	1,289	-10/10	6,976
VVidu Q3 Pro	1,288	-9/9	6,335
SSkyReels V4	1,287	-10/10	6,161
KKling 2.5 Turbo 1080p	1,285	-10/10	4,439
KKling 3.0 1080p (Pro)	1,283	-10/10	5,710
KKling 3.0 Omni 1080p (Pro)	1,281	-10/10	5,147
PPixVerse V5.6	1,280	-10/10	5,261
Veo 3.1 Fast Preview	1,277	-10/10	4,922
Veo 3.1 Preview	1,275	-10/10	5,118
MHailuo 02 0616	1,274	-10/10	2,868
Veo 3.1 Fast	1,271	-10/10	5,183
PPixVerse V5.5	1,270	-10/10	4,756
KKling 2.6 Standard (January)	1,270	-10/10	5,140
KKling 3.0 720p (Standard)	1,265	-10/10	5,614
Runway Gen-4.5	1,264	-10/10	4,426

Frontier Intelligence Over Time

Tracks the highest-scoring model each month by Intelligence Index.

Intelligence Evaluations

The hardest subset of GPQA, filtered for questions where experts agree and non-experts struggle.

No data for GPQA Diamond

Output Speed

View All

Tokens generated per second. Higher throughput means faster responses.

Pricing: Input vs Output

View All

Compares input and output token pricing (per 1M tokens) for the most affordable models.

Cost Efficiency

Intelligence Index plotted against cost per 1M tokens.

Key Findings

Fastest Model

Mercury 2

777 t/s

Best Value

Qwen3.5 0.8B (Reasoning)

10.5 idx

Top Intelligence

GPT-5.5 (xhigh)

60.2 index

Top Coder

GPT-5.5 (xhigh)

59.1 index

Category Overview

Intelligence

GPT-5.5 (xhigh)

N/A

Text-to-Speech

Journey

N/A

Text-to-Image

HunyuanImage 2.1

N/A

Image Editing

Nano Banana 2 (Gemini 3.1 Flash Image Preview)

N/A

Text-to-Video

Step-Video-T2V

N/A

Image-to-Video

Kling 2.5 Turbo 1080p

N/A

Latest Updates

Grok 4.3

xAI

4/30/2026

Mistral Medium 3.5

Mistral

4/29/2026

Granite 4.1 8B

IBM

4/29/2026

DeepSeek V4 Pro (Reasoning, Max Effort)

DeepSeek

4/24/2026

DeepSeek V4 Pro (Reasoning, High Effort)

DeepSeek

4/24/2026

Text-to-Image

Compare AI image generation models used in marketing, e-commerce, and creative design workflows.

Explore Models

Performance

Analyze speed, latency, and throughput metrics critical for real-time enterprise applications.

View Benchmarks

Coding

Compare models on code generation, debugging, and software engineering tasks.

See Rankings

Last Updated: May 3, 2026

Independent AI Model Benchmarks for Enterprise, Education, and Research

Cognion provides independent, data-driven rankings of AI models across intelligence, coding, math, performance, pricing, and multimodal generation. Our benchmarks aggregate scores from industry-standard evaluations including MMLU-Pro, GPQA Diamond, HLE, LiveCodeBench, SWE-bench, MATH-500, and AIME — giving technology leaders, software architects, and data science teams a single source of truth for AI platform selection. Explore our intelligence rankings for general reasoning, coding benchmarks for software engineering, and math evaluations for quantitative applications.

Enterprise teams in financial services, healthcare, legal technology, consulting, and e-commerce use our rankings to compare models from OpenAI, Anthropic, Google, Meta, Mistral, and dozens of other providers. Whether you are evaluating GPT-4o for customer support automation, Claude for contract analysis, Gemini for multimodal data processing, or Llama for on-premises deployment, our model directory and pricing comparisons help you make informed decisions within budget.

In education, AI benchmarks inform curriculum design for computer science programs, data science bootcamps, and K-12 STEM initiatives. Researchers and graduate students use our trend analysis to track the pace of AI capability growth, while creative professionals explore our text-to-image, text-to-video, and text-to-speech rankings for media production workflows. For latency-sensitive applications, our performance benchmarks cover throughput, time-to-first-token, and output speed across providers.

Read our methodology to understand how we collect, normalize, and score benchmark data. For programmatic access, our API documentation provides endpoints for model data, leaderboards, and historical trends. Have questions? Visit our FAQ or contact us for enterprise evaluation services.