OrbTop

Artificial Analysis AI Model Benchmark Scraper

AIDEVELOPER TOOLS

Artificial Analysis AI Model Benchmark Scraper

Scrapes LLM benchmark scores, pricing, and performance data from Artificial Analysis — the leading independent evaluator of AI models.

What this actor does

Extracts structured data for ~370 AI language models from Artificial Analysis, including:

  • Benchmark scores: Quality index, MMLU-Pro, GPQA Diamond, HumanEval, LiveCodeBench, MATH-500, MMMU-Pro, and more
  • Pricing: Input, output, and blended cost per million tokens
  • Performance: Median throughput (tokens/sec) and time-to-first-token latency
  • Provider info: All hosting providers, cheapest provider by blended price
  • Model metadata: Creator/lab, release date, parameter count, context window, license, open-weight status

All data is extracted in a single request to the /models page, which serves the full model dataset inline as a React Server Component payload. No per-model crawling needed.

Use cases

  • Model selection: Compare cost-vs-quality trade-offs across providers
  • Price monitoring: Track pricing changes across OpenAI, Anthropic, Google, Meta, and 40+ hosting providers
  • Research and benchmarking: Import baseline scores into your own evaluation pipeline
  • Cost optimization: Find the cheapest or fastest provider for a given quality target

Input

Field Type Required Default Description
maxItems integer Yes 10 Maximum number of model records to return. Set to a large number (e.g. 500) to retrieve all models.

Output

Each dataset item represents one AI model. Example record:

{
  "model_slug": "claude-4-opus",
  "model_name": "Claude 4 Opus",
  "provider": "Anthropic",
  "release_date": "2025-05-22",
  "parameter_count": null,
  "context_window_tokens": 200000,
  "aa_quality_index": 57.4,
  "mmlu_pro_score": 0.812,
  "gpqa_diamond_score": 0.738,
  "humaneval_score": 0.921,
  "math_score": 84.1,
  "chatbot_arena_elo": null,
  "aider_polyglot_score": null,
  "livecodebench_score": 0.703,
  "mmmu_score": null,
  "benchmark_breakdown": "{\"agentic_index\":45.2,\"coding_index\":68.1,...}",
  "price_input_usd_per_million": 15,
  "price_output_usd_per_million": 75,
  "price_blended_usd_per_million": 30,
  "throughput_tokens_per_second": 58.3,
  "latency_first_token_ms": 1204,
  "hosting_providers": "[\"Anthropic\",\"Amazon Bedrock\",\"Google Vertex AI\"]",
  "cheapest_provider": "Amazon Bedrock",
  "fastest_provider": null,
  "license": "proprietary",
  "is_open_weight": false,
  "profile_url": "https://artificialanalysis.ai/models/claude-4-opus",
  "scraped_at": "2026-05-31T08:00:00.000Z"
}

Notes on specific fields:

  • chatbot_arena_elo and aider_polyglot_score are always null — these metrics are not tracked by Artificial Analysis and would require separate scrapers from Chatbot Arena and Aider.chat.
  • benchmark_breakdown is a JSON string containing additional sub-benchmarks (agentic_index, coding_index, math_index, HLE, AIME-2025, IFBench, SciCode, LCR, Omniscience).
  • hosting_providers is a JSON string array of all providers offering this model.
  • fastest_provider is always null — per-provider throughput breakdown is not available on the listing page.

Notes

  • The actor makes a single HTTP request to https://artificialanalysis.ai/models. No proxy required.
  • The full dataset (~370 models) is available in one request. Use maxItems: 500 to get everything.
  • Prices and benchmarks on Artificial Analysis update frequently — run the actor periodically for up-to-date data.