OrbTop

Tour de France Rankings & Stages Scraper

SPORTSAI

Tour de France Rankings & Stages Scraper

Scrapes official Tour de France jersey rankings (GC, points, mountains, youth, and team) plus stage metadata (start/finish cities, distance, elevation, stage type) directly from letour.fr. Covers all 21 stages — updated daily during the July race window.

What does it do?

Each run retrieves:

  • Stage metadata — start/finish cities, total distance, elevation gain, and stage classification (Flat, Hilly, Mountain, Individual Time Trial, Team Time Trial)
  • Jersey rankings for each stage across up to 7 classification types: General (GC yellow), Points (green), Mountains (polka-dot), Youth (white), Team, Combative, and Lanterne Rouge
  • Rider details — position, bib number, name, country (ISO 3-letter code), team name, accumulated time, time gap to leader, and points

Data is sourced from the official site's AJAX ranking endpoints, which return server-rendered HTML for every completed stage.

Why use this instead of scraping ProCyclingStats?

ProCyclingStats is the de facto third-party cycling data source but it is protected by Cloudflare (verified 403 in pre-flight testing). letour.fr serves the same official rankings from clean, datacenter-friendly HTML with no bot protection — no residential proxy required.

Input

Field Type Default Description
year integer 2026 Tour de France year to scrape
stageNumbers array of integers all 21 stages Filter to specific stages, e.g. [1, 2, 21]
rankingTypes array of strings all types Which jersey rankings to scrape: general, points, mountains, youth, team, combative, lanterne_rouge
maxItems integer 0 (unlimited) Hard cap on total records saved

Output dataset fields

Field Type Description
year integer Tour de France year
stage_number integer Stage number (1–21)
stage_type string Flat / Hilly / Mountain / Individual Time Trial / Team Time Trial
start_city string Stage start city
finish_city string Stage finish city
distance_km number Stage distance in km
elevation_gain_m integer Total elevation gain in metres
ranking_type string general / points / mountains / youth / team / combative / lanterne_rouge
position integer Rider position in the ranking
bib integer Rider bib number
rider_name string Rider abbreviated name (e.g. "T. POGACAR")
rider_country string Country code (ISO 3166-1 alpha-3, e.g. "SLO")
team_name string Full team name
team_code string Team 3-letter code (where available)
finish_time string Accumulated race time for general/youth/team rankings
time_gap string Time gap to race leader
points integer Points total for green/mountains rankings
source string Source URL of the ranking data

Sample output (Stage 21 GC, 2025)

{
  "year": 2026,
  "stage_number": 21,
  "stage_type": "Flat",
  "start_city": "Thoiry",
  "finish_city": "Paris Champs-Élysées",
  "distance_km": 133,
  "elevation_gain_m": 1000,
  "ranking_type": "general",
  "position": 1,
  "bib": 1,
  "rider_name": "T. POGACAR",
  "rider_country": "SLO",
  "team_name": "UAE TEAM EMIRATES XRG",
  "team_code": null,
  "finish_time": "76h 00' 32''",
  "time_gap": null,
  "points": null,
  "source": "https://www.letour.fr/en/ajax/ranking/21/itg/2b9.../subtab"
}

Usage tips

  • During the race window (July 4–26, 2026): Run daily to capture updated standings after each stage finishes. Stage 1–20 rankings update that evening; Stage 21 (Paris) closes the tour.
  • Historical data: Stage results from the 2025 Tour are accessible at the same URL paths.
  • Future stages: Ranking tabs are only populated for completed stages. Running the scraper on a future stage returns 0 results for that stage — no error is thrown.
  • Scaling: For a full run (21 stages × 5 jersey types = ~105 ranking tables × ~160 riders = ~16,800 rows), a single run completes in under 3 minutes at the default concurrency of 5.

Technical notes

The actor uses a 4-level hierarchical crawl:

  1. /en/overall-route → discovers all stage links
  2. /en/stage-N → extracts stage metadata (city names from page title, distance/type/elevation from stageHeader__infos blocks)
  3. /en/rankings/stage-N → discovers AJAX subtab URLs (hash-keyed, rotated per year)
  4. /en/ajax/ranking/N/<type>/<hash>/subtab → parses rider rows from the rankings table HTML