Cultural Heritage Online Archive Scraper
Cultural Heritage Online Archive Scraper
Scrape heritage object records from Cultural Heritage Online (文化遺産オンライン, online.bunka.go.jp) — the Agency for Cultural Affairs' digital museum of Japan's national heritage.
The actor extracts object-level records from the site's full 136,000+ item archive, searchable by keyword, classification, era, genre, and region. Each record includes title, kana reading, era, genre, region, holding institution, description, and a list of high-resolution image URLs — the imagery is the unique value of this source.
What you get
Each record includes:
| Field | Description |
|---|---|
heritage_id |
Unique item ID from /heritages/detail/<id> |
title |
Object title (名称) |
title_kana |
Phonetic reading (ふりがな) |
genre |
Category (絵画 / 彫刻 / 工芸品 / 書跡 / etc.) |
era |
Historical period in Japanese (江戸時代, 平安時代, etc.) |
era_normalized |
Normalised Latin slug (edo / heian / kamakura / etc.) |
region |
Prefecture or region (所在地域) |
holder |
Holding institution (所蔵館) |
material |
Material and technique (材質・技法) where listed |
dimensions |
Dimensions (法量) where listed |
description |
Object description (解説) |
image_urls |
Array of high-resolution image URLs |
detail_url |
Full URL of the detail page |
Usage
Basic keyword search
{
"keywords": "仏像",
"maxItems": 100
}
Searches the keyword parameter on /heritages/search/result. Any Japanese text works — artist names, object names, classifications, institution names.
Scrape all items
Leave keywords empty to iterate the full archive listing (/heritages/search/result with no filter). The archive contains 136,000+ records; use maxItems to control run scope.
{
"maxItems": 500
}
Input schema
| Parameter | Type | Default | Description |
|---|---|---|---|
keywords |
string | — | Search keyword (e.g. 仏像, 絵画, 平安). Leave blank for all items. |
maxItems |
integer | 20 | Maximum number of records to scrape. |
Notes
- Respects the site's
crawl-delay: 3by capping concurrency at 3. - No authentication, no Cloudflare — government endpoint (bunka.go.jp) is fully open.
- Era normalization maps Japanese period names to lowercase Latin slugs for use in downstream pipelines.
- Images use the pattern
https://online.bunka.go.jp/heritage/<id>/_<N>/...— no auth needed. - This source is distinct from the kunishitei designation database (
kunishitei.bunka.go.jp) and the NDL jpsearch (jpsearch.go.jp). It surfaces the object-level museum records with images, not the legal designation register or the bibliographic aggregator.