OrbTop

Olympedia Olympic Athlete Medal Database Scraper

SPORTSAI

Olympedia Olympic Athlete & Medal Database Scraper

Scrape the complete Olympedia database — the most comprehensive public record of Olympic history. Outputs one structured record per athlete-event participation: athlete name and ID, country (NOC code), games edition, sport, event name, finishing position, medal (Gold/Silver/Bronze), performance result value, and any record flag (OR, WR, etc.).

Covers every Olympic Games from Athens 1896 through the present, across all sports, disciplines, and discontinued events — over 200,000 participation records in total.

Use Cases

  • Build machine-learning training datasets on Olympic performance history
  • Research medal counts by country, athlete, or Games edition
  • Analyze sports journalism with structured, citable historical data
  • Track athletes across multiple Games editions
  • Cross-reference with other sports databases by athlete ID

Input

Parameter Type Description Default
maxItems integer Maximum number of records to return 15
editionIds array Optional list of edition IDs to scrape (e.g. [1, 2, 3] for 1896–1904). When empty, scrapes all editions. [1]

To scrape all editions (full database run), set editionIds to an empty array [] and raise maxItems to the desired cap. Edition IDs correspond to Olympedia's internal editions/{id}/result numbering (ID 1 = Athina 1896, ID 2 = Paris 1900, etc.).

Output

Each record represents one athlete's participation in one event at one Games edition:

{
  "athlete_id": "70502",
  "athlete_name": "Carl Schuhmann",
  "gender": null,
  "born": null,
  "died": null,
  "country": "GER",
  "sport": "Artistic Gymnastics",
  "games_edition": "Athina 1896",
  "event": "Horse Vault, Men",
  "result": null,
  "rank": "1",
  "medal": "Gold",
  "record_flag": null,
  "source_url": "https://www.olympedia.org/results/70002"
}

Note: gender, born, and died fields are reserved for future athlete-detail enrichment and are null in the current version.

Notes

  • Olympedia enforces a crawl-delay (robots.txt: 10 seconds). The actor respects this with concurrency capped at 3 — expect approximately 12 pages per minute on a standard run.
  • For targeted research, use editionIds to restrict the crawl to specific Games. This dramatically reduces run time vs. scraping all editions.
  • Team-event rows are included as-is; team names appear in the athlete_name field without individual athletes listed.