OrbTop

Japan Kokkai Diet Proceedings Scraper - NDL Speech Records

BUSINESSDEVELOPER TOOLSOTHER

Japan Diet (Kokkai) NDL Proceedings Scraper

Scrapes speech records from Japan's National Diet Library (NDL) Kokkai API. Returns per-speech records across both chambers and all committees from 1947 to the present — over 1 million speeches in the corpus, available at no cost from the NDL's official public API.

No auth. No proxy. Pure structured JSON from a government API that actually works.

Kokkai NDL Scraper Features

  • Extracts per-speech records with full Japanese text, speaker name, party affiliation, official position, and speaker name reading (よみ)
  • Covers 75+ years of Diet proceedings: House of Representatives (衆議院), House of Councillors (参議院), and joint sessions (両院) from 1947 through the current session
  • Full-text keyword search across the entire corpus — Japanese terms, romaji, or policy keywords
  • Filters by speaker name, committee/meeting name, chamber, session number, and date range
  • Returns Gregorian and wareki dates — because your downstream system wants 2026-04-23 and your Japanese colleagues want 令和8年4月23日
  • Stable NDL citation URLs for every speech and meeting record — suitable for academic references, regulatory citations, and RAG pipelines
  • Returns PDF URLs where available (NDL publishes PDFs with a lag; nulls are normal for recent sessions)
  • No proxy required — the NDL API is a public government service with no IP restrictions

What Can You Do With Kokkai Proceedings Data?

  • Quantitative finance researchers — track BOJ governor and MOF minister commentary on monetary policy, JGB supply, fiscal consolidation. The Diet record is the unfiltered version.
  • Policy researchers — build comparative parliamentary analysis datasets across sessions, parties, and committees
  • LLM training corpora — formal Japanese diarised speech is rare. This corpus is public domain under Article 13 of Japan's Copyright Law, multi-speaker, and consistently formatted
  • Western think tanks — Brookings, CSIS, and RAND Japan desks spend considerable time translating Diet proceedings. This delivers the raw record programmatically
  • Civic tech — political monitoring, party voting analysis, MP speech frequency dashboards

How It Works

  1. Configure your search. Set a keyword, speaker name, committee, chamber, session number, or date range. At least one filter is required — the NDL API doesn't do open-ended dumps.
  2. The scraper calls the NDL speech API with your filters and paginates through results using the cursor-based startRecord / nextRecordPosition mechanism. Page size is 100 — the API maximum.
  3. Each speech record is normalized to the output schema: raw API field names are mapped to snake_case, dates are augmented with wareki equivalents, and speaker position and role are merged into a single speaker_position field.
  4. Results are returned as structured JSON to the Apify dataset.

Kokkai NDL Scraper Input

{
    "searchQuery": "金融政策",
    "speakerName": "",
    "nameOfMeeting": "財務金融委員会",
    "chamber": "衆議院",
    "sessionNumber": 0,
    "dateFrom": "2023-01-01",
    "dateTo": "2024-12-31",
    "maxItems": 100
}

At least one filter must be set. A search that matches zero results returns zero records rather than an error.

Field Type Default Description
searchQuery string "予算" Full-text keyword search. Supports Japanese and romaji.
speakerName string "" Filter by speaker name (partial match). E.g., "安倍" matches 安倍晋三.
nameOfMeeting string "" Committee or meeting name filter. E.g., "予算委員会", "財務金融委員会", "本会議".
chamber string "" Chamber filter: "衆議院", "参議院", "両院", or blank for all.
sessionNumber integer 0 Diet session number (e.g., 213). 0 = all sessions.
dateFrom string "" Start date filter (YYYY-MM-DD). Leave blank for earliest (1947-05-20).
dateTo string "" End date filter (YYYY-MM-DD). Leave blank for most recent.
maxItems integer 10 Maximum records to return. 0 = unlimited.

Kokkai NDL Scraper Output

{
    "speech_id": "121104376X01620230425_070",
    "issue_id": "121104376X01620230425",
    "session": 211,
    "chamber": "衆議院",
    "committee": "財務金融委員会",
    "issue_number": "第16号",
    "meeting_date": "2023-04-25",
    "meeting_date_wareki": "令和5年4月25日",
    "speech_order": 70,
    "speaker": "植田和男",
    "speaker_yomi": "うえだかずお",
    "speaker_group": "内閣提出",
    "speaker_position": "日本銀行総裁",
    "speech_text": "○植田日銀総裁 まず、現在の金融政策運営の考え方についてご説明します...",
    "speech_url": "https://kokkai.ndl.go.jp/txt/121104376X01620230425/70",
    "meeting_url": "https://kokkai.ndl.go.jp/txt/121104376X01620230425",
    "pdf_url": "https://kokkai.ndl.go.jp/pdfb/cm211046_20230425_00.pdf",
    "search_query": "金融政策",
    "source_api_endpoint": "https://kokkai.ndl.go.jp/api/speech"
}
Field Type Description
speech_id string Unique NDL speech identifier
issue_id string Meeting record identifier
session integer Diet session number (国会回次)
chamber string 衆議院, 参議院, or 両院
committee string Committee or meeting name
issue_number string Issue label within the session (e.g., 第16号)
meeting_date string Meeting date in Gregorian YYYY-MM-DD
meeting_date_wareki string Meeting date in Japanese wareki (e.g., 令和5年4月25日)
speech_order integer Speaker turn number within the meeting
speaker string Speaker full name
speaker_yomi string Speaker name reading in hiragana
speaker_group string Speaker's party or parliamentary group
speaker_position string Official position or role (PM, minister, committee chair, etc.)
speech_text string Full speech text in Japanese
speech_url string Canonical NDL URL for this speech
meeting_url string Canonical NDL URL for the full meeting record
pdf_url string PDF URL for the meeting record (null for recent sessions pending publication)
search_query string The keyword that returned this record
source_api_endpoint string NDL API endpoint that produced this record

🔍 FAQ

How do I scrape Japan Diet proceedings?

The Japan Kokkai NDL Proceedings Scraper calls the National Diet Library's official public API at kokkai.ndl.go.jp/api/speech. Set at least one filter — a keyword, speaker name, committee, or date range — and the scraper paginates through all matching records. No credentials or proxy are needed.

What does the Japan Kokkai Diet Scraper cost to run?

The scraper charges $0.10 per run start plus $0.001 per record. A keyword search returning 500 speeches costs roughly $0.60. Unlimited runs returning the full corpus are feasible for budget-conscious users.

Does the Japan Kokkai Diet Scraper need proxies?

The Japan Kokkai NDL Proceedings Scraper doesn't need proxies. The NDL API is a public government service with no IP restrictions or rate limiting beyond reasonable request spacing.

Can I filter by committee or speaker?

The scraper supports filtering by speaker name, committee name, chamber, session number, and date range — independently or in combination. You can pull every BOJ governor speech in the 財務金融委員会 since 2013 in a single run.

What language is the speech text in?

Speech text is in Japanese. The NDL API returns the verbatim Diet record text, which includes standard parliamentary speech conventions (speaker interjections marked ○, procedural text, etc.). Dates are returned in both Gregorian (YYYY-MM-DD) and wareki notation.

How current is the data?

The NDL updates the proceedings database as records are officially published. Major committee hearings from the current session are typically available within days to weeks. PDFs lag longer.


Need More Features?

Need a meeting-level endpoint, additional filters, or bulk export by session? File an issue or get in touch.

Why Use Japan Kokkai NDL Proceedings Scraper?

  • No competition — Zero other Apify actors cover the Japanese Diet. You won't find this data pre-packaged anywhere else for $0.001/record.
  • Public domain corpus — Article 13 of Japan's Copyright Law places government records in the public domain. No licensing headaches, no terms-of-service gray zones.
  • RAG-ready output — Each speech is a self-contained chunk with a stable citation URL, speaker attribution, and precise date. Feed it directly into a vector store or policy monitoring pipeline.