AI Model Intelligence

Independent benchmarks and enterprise AI analysis across 787+ models from 103 providers — helping technology leaders, software engineers, and data science teams select the right AI platform for their business.

787

Total Models

478

Language Models

309

Media Models

21.8

Avg Intelligence

103

Providers

Intelligence Index

Composite intelligence score across all major benchmarks. Toggle between all models, open-weights only, or proprietary only.

Intelligence vs Cost

Plots Intelligence Index against price per 1M tokens. Upper-left models offer the best value.

Intelligence vs Cost — Zoomed In

Same chart with the top 10% most expensive outliers removed.

Image & Video Leaderboard

ELO ratings from head-to-head comparisons of image and video generation models.

ModelELO95% CIAppearances
HHappyHorse-1.01,409-8/87,796
BDreamina Seedance 2.0 720p1,357-10/104,633
Xgrok-imagine-video1,332-9/96,267
BGenFlare 2.01,325-8/87,634
PPixVerse V61,321-7/78,154
KKling 3.0 Omni 1080p (Pro)1,299-10/104,712
SSkyReels V41,297-9/95,338
KKling 2.5 Turbo 1080p1,296-11/114,094
PPixVerse V5.61,291-10/104,853
Google logoVeo 3.1 Fast1,289-10/104,841
VVidu Q3 Pro1,288-9/95,895
Google logoVeo 3.1 Preview1,288-10/104,726
Google logoVeo 3.1 Fast Preview1,286-10/104,561
SRiverflow 2.01,283-9/96,663
MHailuo 02 06161,282-13/132,600
KKling 2.6 Standard (January)1,281-10/104,725
KKling 3.0 1080p (Pro)1,277-9/95,382
PPixVerse V5.51,275-10/104,338
BGenFlare1,273-11/113,663
PPixVerse V51,271-10/105,014

Frontier Intelligence Over Time

Tracks the highest-scoring model each month by Intelligence Index.

Intelligence Evaluations

The hardest subset of GPQA, filtered for questions where experts agree and non-experts struggle.

No data for GPQA Diamond

Output Speed

View All

Tokens generated per second. Higher throughput means faster responses.

Pricing: Input vs Output

View All

Compares input and output token pricing (per 1M tokens) for the most affordable models.

Cost Efficiency

Intelligence Index plotted against cost per 1M tokens.

Last Updated: Apr 8, 2026

Independent AI Model Benchmarks for Enterprise, Education, and Research

Cognion provides independent, data-driven rankings of AI models across intelligence, coding, math, performance, pricing, and multimodal generation. Our benchmarks aggregate scores from industry-standard evaluations including MMLU-Pro, GPQA Diamond, HLE, LiveCodeBench, SWE-bench, MATH-500, and AIME — giving technology leaders, software architects, and data science teams a single source of truth for AI platform selection. Explore our intelligence rankings for general reasoning, coding benchmarks for software engineering, and math evaluations for quantitative applications.

Enterprise teams in financial services, healthcare, legal technology, consulting, and e-commerce use our rankings to compare models from OpenAI, Anthropic, Google, Meta, Mistral, and dozens of other providers. Whether you are evaluating GPT-4o for customer support automation, Claude for contract analysis, Gemini for multimodal data processing, or Llama for on-premises deployment, our model directory and pricing comparisons help you make informed decisions within budget.

In education, AI benchmarks inform curriculum design for computer science programs, data science bootcamps, and K-12 STEM initiatives. Researchers and graduate students use our trend analysis to track the pace of AI capability growth, while creative professionals explore our text-to-image, text-to-video, and text-to-speech rankings for media production workflows. For latency-sensitive applications, our performance benchmarks cover throughput, time-to-first-token, and output speed across providers.

Read our methodology to understand how we collect, normalize, and score benchmark data. For programmatic access, our API documentation provides endpoints for model data, leaderboards, and historical trends. Have questions? Visit our FAQ or contact us for enterprise evaluation services.