UNESCO World Heritage Sites List Scraper
TRAVELEDUCATION
UNESCO World Heritage Sites List Scraper
Scrape the complete UNESCO World Heritage List — all ~1,250 inscribed sites with geo-coordinates, cultural/natural/mixed category, inscription criteria, danger status, and states parties. Data sourced directly from the official UNESCO World Heritage Centre XML feed.
UNESCO World Heritage Sites List Scraper Features
- Returns all 1,200+ inscribed sites from the official UNESCO XML feed in a single run
- Optional: scrape the tentative list (~1,700 candidate sites) or both lists combined
- Extracts 19 fields per site: name, category, region, lat/lon, inscription year, criteria codes, danger status, area, and more
- Criteria codes returned in standard UNESCO format:
(i),(iii),(vi)— ready for filtering or display - Danger status and year included — distinguishes currently-at-risk heritage from safe inscriptions
- Direct URLs to each site's UNESCO detail page and primary image
- States parties returned as a comma-separated list — easy to filter by country
- Pay-per-record pricing: $0.10 per run + $0.002 per record
Who Uses UNESCO World Heritage Data?
- Travel content creators — Build destination guides, itineraries, and "top heritage sites by region" lists with verified UNESCO data
- Tourism boards and DMOs — Access authoritative site metadata, categories, and danger status for official travel marketing
- Education platforms — Populate lesson plans, quizzes, and encyclopedias with up-to-date heritage site information
- LLM and RAG knowledge bases — Ingest the full World Heritage List as a structured dataset for AI assistants and retrieval systems
- Researchers and NGOs — Analyze distribution of heritage sites by country, region, category, or danger status
- App developers — Build interactive heritage maps, country-by-country explorers, or cultural travel apps
How UNESCO World Heritage Sites List Scraper Works
- Select a list type:
inscribed(default),tentative, orall. - The actor obtains a valid session for the UNESCO World Heritage Centre website, which is protected by Cloudflare.
- The official XML feed is fetched — a single 2.4 MB document containing all site records.
- Each
<row>element is parsed into a structured record with all 19 fields. - Records stream into the Apify dataset. A full inscribed-list run (~1,250 sites) finishes in under two minutes.
Input
{
"maxItems": 100,
"listType": "inscribed"
}
| Field | Type | Default | Description |
|---|---|---|---|
maxItems |
integer | 0 (all) | Maximum number of records to return. Set to 0 or omit for all sites. |
listType |
string | inscribed |
Which list to scrape: inscribed (tentative (all (both combined). |
proxyConfiguration |
object | Apify residential | Proxy settings. The actor requires residential proxies to pass the site's Cloudflare protection. Leave as default unless you have a specific proxy requirement. |
Output
Each record in the dataset represents one UNESCO World Heritage Site.
{
"site_id": 90,
"name": "Abu Mena",
"name_local": null,
"category": "Cultural",
"short_description": "<p>The church, baptistry, basilicas, public buildings, streets...</p>",
"states_parties": "Egypt",
"region": "Arab States",
"latitude": 30.8358333333,
"longitude": 29.66666667,
"date_inscribed": 1979,
"criteria": "(iv)",
"in_danger": true,
"danger_listed_year": 2001,
"area_hectares": null,
"extension": false,
"transboundary": false,
"detail_url": "https://whc.unesco.org/en/list/90",
"image_url": "https://whc.unesco.org/uploads/sites/site_90.jpg",
"source_id": "90"
}
| Field | Type | Description |
|---|---|---|
site_id |
integer | Unique numeric site identifier assigned by UNESCO |
name |
string | Site name in English |
name_local |
string | Site name in the local/official language (null if not published in XML) |
category |
string | Site category: Cultural, Natural, or Mixed |
short_description |
string | Official UNESCO short description (may contain HTML) |
states_parties |
string | Comma-separated list of countries (full names, e.g. France,Spain) |
region |
string | UNESCO world region (e.g. Europe and North America, Asia and the Pacific) |
latitude |
number | Site latitude in decimal degrees (primary component for transnational sites) |
longitude |
number | Site longitude in decimal degrees |
date_inscribed |
integer | Year the site was inscribed on the World Heritage List |
criteria |
string | Inscription criteria as a comma-separated string, e.g. (i),(iii),(vi) |
in_danger |
boolean | true if the site is on the List of World Heritage in Danger |
danger_listed_year |
integer | Year the site was added to the danger list (null if not in danger) |
area_hectares |
number | Total inscribed area in hectares (null — not published in the XML feed) |
extension |
boolean | true if this inscription was an extension of an earlier site |
transboundary |
boolean | true if the site spans multiple countries |
detail_url |
string | Full URL to the site's page on the UNESCO World Heritage Centre website |
image_url |
string | URL of the primary site image on the UNESCO website |
source_id |
string | Raw source identifier from the XML feed |
Notes
area_hectaresisnullfor all records — the official XML feed does not include area data. Area figures are available on per-site detail pages.name_localisnullfor all inscribed-list records — the inscribed XML does not include local-language names.- For tentative-list sites,
date_inscribedholds the submission year, not an inscription year (these sites are not yet inscribed). - Latitude/longitude values for transnational sites with multiple geographic components reflect the first component's coordinates.