OrbTop

UNESCO World Heritage Sites List Scraper

TRAVELEDUCATION

UNESCO World Heritage Sites List Scraper

Scrape the complete UNESCO World Heritage List — all ~1,250 inscribed sites with geo-coordinates, cultural/natural/mixed category, inscription criteria, danger status, and states parties. Data sourced directly from the official UNESCO World Heritage Centre XML feed.


UNESCO World Heritage Sites List Scraper Features

  • Returns all 1,200+ inscribed sites from the official UNESCO XML feed in a single run
  • Optional: scrape the tentative list (~1,700 candidate sites) or both lists combined
  • Extracts 19 fields per site: name, category, region, lat/lon, inscription year, criteria codes, danger status, area, and more
  • Criteria codes returned in standard UNESCO format: (i),(iii),(vi) — ready for filtering or display
  • Danger status and year included — distinguishes currently-at-risk heritage from safe inscriptions
  • Direct URLs to each site's UNESCO detail page and primary image
  • States parties returned as a comma-separated list — easy to filter by country
  • Pay-per-record pricing: $0.10 per run + $0.002 per record

Who Uses UNESCO World Heritage Data?

  • Travel content creators — Build destination guides, itineraries, and "top heritage sites by region" lists with verified UNESCO data
  • Tourism boards and DMOs — Access authoritative site metadata, categories, and danger status for official travel marketing
  • Education platforms — Populate lesson plans, quizzes, and encyclopedias with up-to-date heritage site information
  • LLM and RAG knowledge bases — Ingest the full World Heritage List as a structured dataset for AI assistants and retrieval systems
  • Researchers and NGOs — Analyze distribution of heritage sites by country, region, category, or danger status
  • App developers — Build interactive heritage maps, country-by-country explorers, or cultural travel apps

How UNESCO World Heritage Sites List Scraper Works

  1. Select a list type: inscribed (default), tentative, or all.
  2. The actor obtains a valid session for the UNESCO World Heritage Centre website, which is protected by Cloudflare.
  3. The official XML feed is fetched — a single 2.4 MB document containing all site records.
  4. Each <row> element is parsed into a structured record with all 19 fields.
  5. Records stream into the Apify dataset. A full inscribed-list run (~1,250 sites) finishes in under two minutes.

Input

{
    "maxItems": 100,
    "listType": "inscribed"
}
Field Type Default Description
maxItems integer 0 (all) Maximum number of records to return. Set to 0 or omit for all sites.
listType string inscribed Which list to scrape: inscribed (1,250 sites), tentative (1,700 candidate sites), or all (both combined).
proxyConfiguration object Apify residential Proxy settings. The actor requires residential proxies to pass the site's Cloudflare protection. Leave as default unless you have a specific proxy requirement.

Output

Each record in the dataset represents one UNESCO World Heritage Site.

{
    "site_id": 90,
    "name": "Abu Mena",
    "name_local": null,
    "category": "Cultural",
    "short_description": "<p>The church, baptistry, basilicas, public buildings, streets...</p>",
    "states_parties": "Egypt",
    "region": "Arab States",
    "latitude": 30.8358333333,
    "longitude": 29.66666667,
    "date_inscribed": 1979,
    "criteria": "(iv)",
    "in_danger": true,
    "danger_listed_year": 2001,
    "area_hectares": null,
    "extension": false,
    "transboundary": false,
    "detail_url": "https://whc.unesco.org/en/list/90",
    "image_url": "https://whc.unesco.org/uploads/sites/site_90.jpg",
    "source_id": "90"
}
Field Type Description
site_id integer Unique numeric site identifier assigned by UNESCO
name string Site name in English
name_local string Site name in the local/official language (null if not published in XML)
category string Site category: Cultural, Natural, or Mixed
short_description string Official UNESCO short description (may contain HTML)
states_parties string Comma-separated list of countries (full names, e.g. France,Spain)
region string UNESCO world region (e.g. Europe and North America, Asia and the Pacific)
latitude number Site latitude in decimal degrees (primary component for transnational sites)
longitude number Site longitude in decimal degrees
date_inscribed integer Year the site was inscribed on the World Heritage List
criteria string Inscription criteria as a comma-separated string, e.g. (i),(iii),(vi)
in_danger boolean true if the site is on the List of World Heritage in Danger
danger_listed_year integer Year the site was added to the danger list (null if not in danger)
area_hectares number Total inscribed area in hectares (null — not published in the XML feed)
extension boolean true if this inscription was an extension of an earlier site
transboundary boolean true if the site spans multiple countries
detail_url string Full URL to the site's page on the UNESCO World Heritage Centre website
image_url string URL of the primary site image on the UNESCO website
source_id string Raw source identifier from the XML feed

Notes

  • area_hectares is null for all records — the official XML feed does not include area data. Area figures are available on per-site detail pages.
  • name_local is null for all inscribed-list records — the inscribed XML does not include local-language names.
  • For tentative-list sites, date_inscribed holds the submission year, not an inscription year (these sites are not yet inscribed).
  • Latitude/longitude values for transnational sites with multiple geographic components reflect the first component's coordinates.