OrbTop

Hong Kong Transit Scraper - MTR, KMB, Citybus, Light Rail

TRAVELBUSINESSDEVELOPER TOOLS

Hong Kong Transit Scraper

Scrapes Hong Kong's public transit network from the official HKSAR open data programme. Returns the full route and station catalogue plus real-time arrival ETAs across MTR (heavy rail), MTR Light Rail, KMB (Kowloon Motor Bus), and Citybus, all normalized into a single bilingual schema.


Hong Kong Transit Scraper Features

  • Returns MTR heavy rail across all 10 lines — Island, Tsuen Wan, Kwun Tong, Tseung Kwan O, Tung Chung, Airport Express, East Rail, Tuen Ma, South Island, and Disneyland Resort.
  • Returns MTR Light Rail (Tuen Mun / Yuen Long) routes 505 through 761P with bilingual stop names.
  • Returns KMB — ~700 unique bus routes, ~6,000 stops, with Octopus stop IDs and lat/long coordinates.
  • Returns Citybus — ~400 routes covering Hong Kong Island and cross-harbour services.
  • Real-time ETAs for every operator: MTR Next Train, MTR Light Rail Next Train, KMB stop ETA, Citybus stop ETA.
  • Bilingual English / Traditional Chinese station names, route termini, and ETA destinations.
  • Pure JSON / CSV API scraping — no headless browser, no captcha plumbing, no proxy required.

Who Uses Hong Kong Transit Data?

  • Travel-tech apps — power MTR / bus journey planners and arrival-board widgets for Hong Kong's 20M+ annual inbound visitors.
  • MaaS startups — feed multimodal trip planners with normalized rail + bus data across all four operators.
  • Tourism analytics — measure transit accessibility for hotels, attractions, and conference venues across Hong Kong Island, Kowloon, and the New Territories.
  • Logistics & delivery — map last-mile coverage against bus stops and MTR exits.
  • Internal dashboards — populate dropdowns and validate user-typed origin / destination stations against the real HKSAR transit catalogue.

How Hong Kong Transit Scraper Works

  1. Validates the input mode and operator. Rejects unsupported combinations (e.g. route_stops on MTR — heavy rail uses station_list instead).
  2. Pulls the right HKSAR open-data endpoint for the chosen (operator, mode):
    • MTR catalogue → opendata.mtr.com.hk/data/mtr_lines_and_stations.csv
    • MTR Light Rail catalogue → opendata.mtr.com.hk/data/light_rail_routes_and_stops.csv
    • MTR Next Train → rt.data.gov.hk/v1/transport/mtr/getSchedule.php
    • MTR Light Rail Next Train → rt.data.gov.hk/v1/transport/mtr/lrnt/getSchedule
    • KMB → data.etabus.gov.hk/v1/transport/kmb/...
    • Citybus → rt.data.gov.hk/v2/transport/citybus/...
  3. Normalizes each operator's response into a single bilingual schema with consistent field names.
  4. Resolves bus stop IDs to lat/long + bilingual names with bounded fan-out concurrency.
  5. Emits one flat record per route, station, route-stop, or live ETA, capped by maxItems.

All endpoints are unauthenticated and operated by the Hong Kong government or a participating operator. No API key, no proxy, no auth headers required.


Input

{
  "mode": "route_list",
  "operator": "kmb",
  "maxItems": 15
}
Field Type Default Description
mode string route_list One of route_list, station_list, route_stops, stop_eta. See modes below.
operator string mtr One of mtr, mtr_lr, kmb, citybus.
route string 1A Bus route number (KMB / Citybus) or MTR line code (TKL, ISL, etc.). Required for route_stops, optional for stop_eta to filter.
direction string outbound Direction. outbound / inbound for buses, UP / DOWN for MTR. Used by route_stops.
stop_id string Operator-specific stop identifier. Required for stop_eta. KMB uses 16-character hex (A3ADFCDF8487ADB9); Citybus uses 6-digit numeric (001027); MTR uses 3-letter station codes (CEN, TST); Light Rail uses 3-letter stop codes.
mtr_line string Required for stop_eta on MTR. The MTR Next Train API needs both station and line code.
maxItems integer 15 Maximum records to emit. Default is 15 to keep test runs fast. Set higher for full inventory dumps.
proxyConfiguration object no proxy HK Open Data APIs are public — proxy not required. Honoured if you opt in.

Modes

Mode What it returns
route_list Every route operated by the chosen operator.
station_list Full station / stop inventory with bilingual names and (for buses) lat/long coordinates.
route_stops Ordered stop sequence for a single bus route. Bus-only — for MTR use station_list.
stop_eta Real-time arrival ETAs at a single stop.

Run examples

KMB route catalogue:

{
  "mode": "route_list",
  "operator": "kmb",
  "maxItems": 1000
}

MTR Tseung Kwan O Line — live arrivals at Tiu Keng Leng:

{
  "mode": "stop_eta",
  "operator": "mtr",
  "stop_id": "TIK",
  "mtr_line": "TKL"
}

KMB bus 1A — full outbound stop list with bilingual names + lat/long:

{
  "mode": "route_stops",
  "operator": "kmb",
  "route": "1A",
  "direction": "outbound",
  "maxItems": 100
}

Citybus stop ETAs at Central (Macao Ferry):

{
  "mode": "stop_eta",
  "operator": "citybus",
  "stop_id": "001027"
}

Hong Kong Transit Scraper Output Fields

Every record carries a record_type. The unused fields for that record type are empty strings or null.

record_type Mode Operator
route route_list all
station station_list all
route_stop route_stops KMB, Citybus
eta stop_eta all

Route record example (KMB)

{
  "record_type": "route",
  "operator": "KMB",
  "service_type": "bus",
  "route_number": "1A",
  "route_origin_en": "STAR FERRY",
  "route_origin_zh": "尖沙咀碼頭",
  "route_dest_en": "SAU MAU PING (CENTRAL)",
  "route_dest_zh": "中秀茂坪",
  "direction": "inbound",
  "service_class": "1",
  "source_url": "https://data.etabus.gov.hk/v1/transport/kmb/route/",
  "scraped_at": "2026-05-02T16:56:14.723Z"
}

Station record example (MTR)

{
  "record_type": "station",
  "operator": "MTR",
  "service_type": "metro",
  "line_code": "TKL",
  "line_name": "Tseung Kwan O Line",
  "direction": "DT",
  "station_code": "NOP",
  "station_name_en": "North Point",
  "station_name_zh": "北角",
  "sequence": 1,
  "source_url": "https://opendata.mtr.com.hk/data/mtr_lines_and_stations.csv",
  "scraped_at": "2026-05-02T16:56:14.723Z"
}

ETA record example (MTR live)

{
  "record_type": "eta",
  "operator": "MTR",
  "service_type": "metro",
  "line_code": "TKL",
  "line_name": "Tseung Kwan O Line",
  "station_code": "TIK",
  "direction": "UP",
  "eta_seq": 1,
  "eta_time": "2026-05-03 00:36:28",
  "eta_minutes": 3,
  "eta_destination_en": "POA",
  "platform": "3",
  "source_url": "https://rt.data.gov.hk/v1/transport/mtr/getSchedule.php?line=TKL&sta=TIK",
  "scraped_at": "2026-05-02T16:56:14.723Z"
}

Field reference

Field Type Description
record_type string route, station, route_stop, or eta.
operator string MTR, MTR_LR, KMB, or CTB.
service_type string metro / light_rail / bus / airport_express.
route_number string Bus route number or MTR line code.
route_origin_en / route_origin_zh string Origin terminus (English / Traditional Chinese).
route_dest_en / route_dest_zh string Destination terminus.
direction string outbound / inbound for buses; UP / DOWN for MTR.
service_class string KMB service variant. 1 is the primary route; higher numbers are alternates.
line_code / line_name string MTR line identifier and full name.
station_code string Operator-specific stop identifier.
station_name_en / station_name_zh string Stop / station name.
sequence number Stop sequence on a route (1-based).
latitude / longitude number WGS84 coordinates. Buses only — MTR catalogue does not publish coordinates.
eta_seq number ETA index (1, 2, 3 for the next three arrivals).
eta_time string ETA timestamp (ISO 8601 / HKT).
eta_minutes number Minutes to next arrival.
eta_destination_en / eta_destination_zh string Arriving service's destination.
platform string MTR platform number.
remarks_en / remarks_zh string Operator-published remark (e.g. "Bus is full", "Service ended").
data_timestamp string Operator-published data timestamp.
source_url string Source endpoint the record was derived from.
scraped_at string ISO 8601 timestamp when the record was scraped.

FAQ

How do I scrape Hong Kong transit data?

Pick a mode (route_list, station_list, route_stops, stop_eta) and an operator (mtr, mtr_lr, kmb, citybus), then run. The scraper hits the right HKSAR open-data endpoint, normalizes the response, and emits flat records.

Does Hong Kong Transit Scraper need an API key?

No. The HKSAR open-data programme publishes these endpoints for free public use. No registration, no token, no rate-limit headers.

Does the scraper return fares?

Not in v1. Neither KMB nor Citybus expose fares via the open APIs, and MTR fares are a separate closed download. Fare modes are out of scope for this version.

What about Star Ferry, First Ferry, TurboJET, or HK Tramways?

Out of scope. Star Ferry, First Ferry, and TurboJET publish PDF schedules (no structured API). HK Tramways has no public data feed at all. Adding them would require HTML scraping that is materially more work than the API-only surface.

How do I find a stop ID for stop_eta?

Run station_list mode for the operator first. The station_code field on each record is the ID you pass to stop_eta. For MTR, also note the line_codestop_eta on MTR needs both.

How fresh is the data?

ETAs are live (~30 second refresh). Route and stop catalogues are refreshed daily by KMB and Citybus, quarterly by MTR.

How many records does a full run produce?

Per operator at full inventory: KMB ~6,000 stops or ~1,600 route-direction-variants; Citybus ~3,000 stops or ~400 routes; MTR 99 stations × 2 directions = ~270 station rows; Light Rail ~70 stops. Set maxItems accordingly — default 15 keeps test runs fast.


Need More Features?

Need ferry coverage, fare data, journey-planning A→B routing, or a GTFS-RT export? File an issue or get in touch.

Why Use Hong Kong Transit Scraper?

  • Free APIs, low cost — pay-per-event pricing, ~$0.0008 per record at the default coefficient. A full Hong Kong network catalogue dump costs less than a single Octopus tap.
  • Bilingual & normalized — every record carries both English and Traditional Chinese names. Field names are consistent across MTR, KMB, Citybus, and Light Rail.
  • Stable — no headless browser, no captcha solver, no scraping-the-DOM heuristics. Just public HKSAR open-data endpoints over HTTPS.