Olympedia Olympic Athlete Medal Database Scraper
Olympedia Olympic Athlete & Medal Database Scraper
Scrape the complete Olympedia database — the most comprehensive public record of Olympic history. Outputs one structured record per athlete-event participation: athlete name and ID, country (NOC code), games edition, sport, event name, finishing position, medal (Gold/Silver/Bronze), performance result value, and any record flag (OR, WR, etc.).
Covers every Olympic Games from Athens 1896 through the present, across all sports, disciplines, and discontinued events — over 200,000 participation records in total.
Use Cases
- Build machine-learning training datasets on Olympic performance history
- Research medal counts by country, athlete, or Games edition
- Analyze sports journalism with structured, citable historical data
- Track athletes across multiple Games editions
- Cross-reference with other sports databases by athlete ID
Input
| Parameter | Type | Description | Default |
|---|---|---|---|
maxItems |
integer | Maximum number of records to return | 15 |
editionIds |
array | Optional list of edition IDs to scrape (e.g. [1, 2, 3] for 1896–1904). When empty, scrapes all editions. |
[1] |
To scrape all editions (full database run), set editionIds to an empty array [] and raise maxItems to the desired cap. Edition IDs correspond to Olympedia's internal editions/{id}/result numbering (ID 1 = Athina 1896, ID 2 = Paris 1900, etc.).
Output
Each record represents one athlete's participation in one event at one Games edition:
{
"athlete_id": "70502",
"athlete_name": "Carl Schuhmann",
"gender": null,
"born": null,
"died": null,
"country": "GER",
"sport": "Artistic Gymnastics",
"games_edition": "Athina 1896",
"event": "Horse Vault, Men",
"result": null,
"rank": "1",
"medal": "Gold",
"record_flag": null,
"source_url": "https://www.olympedia.org/results/70002"
}
Note: gender, born, and died fields are reserved for future athlete-detail enrichment and are null in the current version.
Notes
- Olympedia enforces a crawl-delay (robots.txt: 10 seconds). The actor respects this with concurrency capped at 3 — expect approximately 12 pages per minute on a standard run.
- For targeted research, use
editionIdsto restrict the crawl to specific Games. This dramatically reduces run time vs. scraping all editions. - Team-event rows are included as-is; team names appear in the
athlete_namefield without individual athletes listed.