Artificial Analysis AI Model Benchmark Scraper
AIDEVELOPER TOOLS
Artificial Analysis AI Model Benchmark Scraper
Scrapes LLM benchmark scores, pricing, and performance data from Artificial Analysis — the leading independent evaluator of AI models.
What this actor does
Extracts structured data for ~370 AI language models from Artificial Analysis, including:
- Benchmark scores: Quality index, MMLU-Pro, GPQA Diamond, HumanEval, LiveCodeBench, MATH-500, MMMU-Pro, and more
- Pricing: Input, output, and blended cost per million tokens
- Performance: Median throughput (tokens/sec) and time-to-first-token latency
- Provider info: All hosting providers, cheapest provider by blended price
- Model metadata: Creator/lab, release date, parameter count, context window, license, open-weight status
All data is extracted in a single request to the /models page, which serves the full model dataset inline as a React Server Component payload. No per-model crawling needed.
Use cases
- Model selection: Compare cost-vs-quality trade-offs across providers
- Price monitoring: Track pricing changes across OpenAI, Anthropic, Google, Meta, and 40+ hosting providers
- Research and benchmarking: Import baseline scores into your own evaluation pipeline
- Cost optimization: Find the cheapest or fastest provider for a given quality target
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
maxItems |
integer | Yes | 10 | Maximum number of model records to return. Set to a large number (e.g. 500) to retrieve all models. |
Output
Each dataset item represents one AI model. Example record:
{
"model_slug": "claude-4-opus",
"model_name": "Claude 4 Opus",
"provider": "Anthropic",
"release_date": "2025-05-22",
"parameter_count": null,
"context_window_tokens": 200000,
"aa_quality_index": 57.4,
"mmlu_pro_score": 0.812,
"gpqa_diamond_score": 0.738,
"humaneval_score": 0.921,
"math_score": 84.1,
"chatbot_arena_elo": null,
"aider_polyglot_score": null,
"livecodebench_score": 0.703,
"mmmu_score": null,
"benchmark_breakdown": "{\"agentic_index\":45.2,\"coding_index\":68.1,...}",
"price_input_usd_per_million": 15,
"price_output_usd_per_million": 75,
"price_blended_usd_per_million": 30,
"throughput_tokens_per_second": 58.3,
"latency_first_token_ms": 1204,
"hosting_providers": "[\"Anthropic\",\"Amazon Bedrock\",\"Google Vertex AI\"]",
"cheapest_provider": "Amazon Bedrock",
"fastest_provider": null,
"license": "proprietary",
"is_open_weight": false,
"profile_url": "https://artificialanalysis.ai/models/claude-4-opus",
"scraped_at": "2026-05-31T08:00:00.000Z"
}
Notes on specific fields:
chatbot_arena_eloandaider_polyglot_scoreare alwaysnull— these metrics are not tracked by Artificial Analysis and would require separate scrapers from Chatbot Arena and Aider.chat.benchmark_breakdownis a JSON string containing additional sub-benchmarks (agentic_index, coding_index, math_index, HLE, AIME-2025, IFBench, SciCode, LCR, Omniscience).hosting_providersis a JSON string array of all providers offering this model.fastest_provideris alwaysnull— per-provider throughput breakdown is not available on the listing page.
Notes
- The actor makes a single HTTP request to
https://artificialanalysis.ai/models. No proxy required. - The full dataset (~370 models) is available in one request. Use
maxItems: 500to get everything. - Prices and benchmarks on Artificial Analysis update frequently — run the actor periodically for up-to-date data.