OrbTop

Cultural Heritage Online Archive Scraper

AIEDUCATION

Cultural Heritage Online Archive Scraper

Scrape heritage object records from Cultural Heritage Online (文化遺産オンライン, online.bunka.go.jp) — the Agency for Cultural Affairs' digital museum of Japan's national heritage.

The actor extracts object-level records from the site's full 136,000+ item archive, searchable by keyword, classification, era, genre, and region. Each record includes title, kana reading, era, genre, region, holding institution, description, and a list of high-resolution image URLs — the imagery is the unique value of this source.

What you get

Each record includes:

Field Description
heritage_id Unique item ID from /heritages/detail/<id>
title Object title (名称)
title_kana Phonetic reading (ふりがな)
genre Category (絵画 / 彫刻 / 工芸品 / 書跡 / etc.)
era Historical period in Japanese (江戸時代, 平安時代, etc.)
era_normalized Normalised Latin slug (edo / heian / kamakura / etc.)
region Prefecture or region (所在地域)
holder Holding institution (所蔵館)
material Material and technique (材質・技法) where listed
dimensions Dimensions (法量) where listed
description Object description (解説)
image_urls Array of high-resolution image URLs
detail_url Full URL of the detail page

Usage

Basic keyword search

{
    "keywords": "仏像",
    "maxItems": 100
}

Searches the keyword parameter on /heritages/search/result. Any Japanese text works — artist names, object names, classifications, institution names.

Scrape all items

Leave keywords empty to iterate the full archive listing (/heritages/search/result with no filter). The archive contains 136,000+ records; use maxItems to control run scope.

{
    "maxItems": 500
}

Input schema

Parameter Type Default Description
keywords string Search keyword (e.g. 仏像, 絵画, 平安). Leave blank for all items.
maxItems integer 20 Maximum number of records to scrape.

Notes

  • Respects the site's crawl-delay: 3 by capping concurrency at 3.
  • No authentication, no Cloudflare — government endpoint (bunka.go.jp) is fully open.
  • Era normalization maps Japanese period names to lowercase Latin slugs for use in downstream pipelines.
  • Images use the pattern https://online.bunka.go.jp/heritage/<id>/_<N>/... — no auth needed.
  • This source is distinct from the kunishitei designation database (kunishitei.bunka.go.jp) and the NDL jpsearch (jpsearch.go.jp). It surfaces the object-level museum records with images, not the legal designation register or the bibliographic aggregator.