Tour de France Rankings & Stages Scraper
SPORTSAI
Tour de France Rankings & Stages Scraper
Scrapes official Tour de France jersey rankings (GC, points, mountains, youth, and team) plus stage metadata (start/finish cities, distance, elevation, stage type) directly from letour.fr. Covers all 21 stages — updated daily during the July race window.
What does it do?
Each run retrieves:
- Stage metadata — start/finish cities, total distance, elevation gain, and stage classification (Flat, Hilly, Mountain, Individual Time Trial, Team Time Trial)
- Jersey rankings for each stage across up to 7 classification types: General (GC yellow), Points (green), Mountains (polka-dot), Youth (white), Team, Combative, and Lanterne Rouge
- Rider details — position, bib number, name, country (ISO 3-letter code), team name, accumulated time, time gap to leader, and points
Data is sourced from the official site's AJAX ranking endpoints, which return server-rendered HTML for every completed stage.
Why use this instead of scraping ProCyclingStats?
ProCyclingStats is the de facto third-party cycling data source but it is protected by Cloudflare (verified 403 in pre-flight testing). letour.fr serves the same official rankings from clean, datacenter-friendly HTML with no bot protection — no residential proxy required.
Input
| Field | Type | Default | Description |
|---|---|---|---|
year |
integer | 2026 | Tour de France year to scrape |
stageNumbers |
array of integers | all 21 stages | Filter to specific stages, e.g. [1, 2, 21] |
rankingTypes |
array of strings | all types | Which jersey rankings to scrape: general, points, mountains, youth, team, combative, lanterne_rouge |
maxItems |
integer | 0 (unlimited) | Hard cap on total records saved |
Output dataset fields
| Field | Type | Description |
|---|---|---|
year |
integer | Tour de France year |
stage_number |
integer | Stage number (1–21) |
stage_type |
string | Flat / Hilly / Mountain / Individual Time Trial / Team Time Trial |
start_city |
string | Stage start city |
finish_city |
string | Stage finish city |
distance_km |
number | Stage distance in km |
elevation_gain_m |
integer | Total elevation gain in metres |
ranking_type |
string | general / points / mountains / youth / team / combative / lanterne_rouge |
position |
integer | Rider position in the ranking |
bib |
integer | Rider bib number |
rider_name |
string | Rider abbreviated name (e.g. "T. POGACAR") |
rider_country |
string | Country code (ISO 3166-1 alpha-3, e.g. "SLO") |
team_name |
string | Full team name |
team_code |
string | Team 3-letter code (where available) |
finish_time |
string | Accumulated race time for general/youth/team rankings |
time_gap |
string | Time gap to race leader |
points |
integer | Points total for green/mountains rankings |
source |
string | Source URL of the ranking data |
Sample output (Stage 21 GC, 2025)
{
"year": 2026,
"stage_number": 21,
"stage_type": "Flat",
"start_city": "Thoiry",
"finish_city": "Paris Champs-Élysées",
"distance_km": 133,
"elevation_gain_m": 1000,
"ranking_type": "general",
"position": 1,
"bib": 1,
"rider_name": "T. POGACAR",
"rider_country": "SLO",
"team_name": "UAE TEAM EMIRATES XRG",
"team_code": null,
"finish_time": "76h 00' 32''",
"time_gap": null,
"points": null,
"source": "https://www.letour.fr/en/ajax/ranking/21/itg/2b9.../subtab"
}
Usage tips
- During the race window (July 4–26, 2026): Run daily to capture updated standings after each stage finishes. Stage 1–20 rankings update that evening; Stage 21 (Paris) closes the tour.
- Historical data: Stage results from the 2025 Tour are accessible at the same URL paths.
- Future stages: Ranking tabs are only populated for completed stages. Running the scraper on a future stage returns 0 results for that stage — no error is thrown.
- Scaling: For a full run (21 stages × 5 jersey types = ~105 ranking tables × ~160 riders = ~16,800 rows), a single run completes in under 3 minutes at the default concurrency of 5.
Technical notes
The actor uses a 4-level hierarchical crawl:
/en/overall-route→ discovers all stage links/en/stage-N→ extracts stage metadata (city names from page title, distance/type/elevation fromstageHeader__infosblocks)/en/rankings/stage-N→ discovers AJAX subtab URLs (hash-keyed, rotated per year)/en/ajax/ranking/N/<type>/<hash>/subtab→ parses rider rows from the rankings table HTML