OrbTop

FOMC Meeting Transcripts & Minutes Scraper

BUSINESSEDUCATION

FOMC Meeting Transcripts & Minutes Scraper

Extracts FOMC meeting artifacts from the Federal Reserve's historical materials archive — the official public record of every Federal Open Market Committee meeting since 1936. Collects transcript PDFs, minutes, Tealbooks, Beige Books, policy statements, and press conference links for every FOMC meeting in the embargo-cleared corpus (currently 1936–2020). Optionally extracts full plain-text from PDFs with participant lists and topic tags.

What This Scraper Collects

  • Transcripts — verbatim PDFs of FOMC meeting proceedings (released under the 5-year embargo rule)
  • Minutes — official summary of each meeting, released approximately 3 weeks after the meeting
  • Tealbooks A & B — staff economic forecasts and analysis prepared before each meeting
  • Beige Book — regional economic conditions summary from all 12 Federal Reserve Districts
  • Agendas — formal meeting agenda PDFs
  • Policy Statements — press release HTML links for post-meeting rate decisions
  • Press Conferences — chair press conference page links (post-2011)

Each record includes: meeting date, meeting type (regular or conference call), artifact type, artifact URL, Fed Chair name at time of the meeting, minutes release date, statement URL, press conference URL, embargo status, and scraped timestamp. With extractPdfText: true, also includes plain text, semicolon-separated participant names, and heuristic topic tags.

Features

  • Covers the full historical archive from 1936 to 2020 (85 years, 800+ artifacts)
  • Filter by year range with startYear / endYear — run only the years you care about
  • Filter by artifact type — transcripts only, minutes only, or any combination
  • Identifies Fed Chair by meeting date using a built-in tenure map (Volcker, Greenspan, Bernanke, Yellen, Powell)
  • Optional PDF text extraction — extracts participant list from the PRESENT section and heuristic topic tags (inflation, employment, interest rates, balance sheet, GDP, credit, financial stability, international)
  • Detects conference call meetings separately from regular scheduled meetings
  • Runs on 512 MB memory, no proxy required — federalreserve.gov is fully public

Who Uses a FOMC Transcript Dataset?

  • Macroeconomic research desks — build time-series analysis of Fed language, voting patterns, and policy signals across chair eras
  • AI training shops — primary-source central-bank verbatim is high-value training data for finance-aware LLMs and monetary policy models
  • Academic researchers — automates what was previously a hand-download task for papers citing FOMC transcripts
  • Quantitative analysts — run NLP models over FOMC text to extract sentiment, policy stance, and forward guidance signals
  • Journalists and financial writers — search the full historical record for specific topics or speeches

How the Scraper Works

  1. Fetches the Historical Materials by Year index page to enumerate all available year pages.
  2. Filters to years within startYearendYear and crawls each per-year page.
  3. Parses every meeting panel, classifying each link by artifact type.
  4. Emits one record per artifact link, enriched with meeting metadata and chair name.
  5. If extractPdfText: true, downloads each PDF and extracts plain text, participants, and topic tags before saving.

Input

{
  "startYear": 2015,
  "endYear": 2020,
  "artifactTypes": ["transcript", "minutes"],
  "maxItems": 0,
  "extractPdfText": false
}
Field Type Default Description
startYear Integer 2015 Earliest FOMC year to include (1936–2020).
endYear Integer 2020 Latest FOMC year to include (1936–2020).
artifactTypes Array ["transcript", "minutes"] Types to collect: transcript, minutes, tealbook_a, tealbook_b, beige_book, agenda, statement, press_conference.
maxItems Integer 0 Maximum artifact records to return. 0 = unlimited.
extractPdfText Boolean false Download each PDF and extract plain text. Significantly increases runtime.

Collect Only Transcripts, 2010–2020

{
  "startYear": 2010,
  "endYear": 2020,
  "artifactTypes": ["transcript"]
}

Extract PDF Text for NLP Analysis

{
  "startYear": 2015,
  "endYear": 2020,
  "artifactTypes": ["transcript"],
  "extractPdfText": true,
  "maxItems": 20
}

Output Schema

Field Description
meeting_date Meeting date in YYYY-MM-DD (last day for multi-day meetings)
meeting_type regular or conference_call
year Meeting year as integer
artifact_type transcript, minutes, tealbook_a, tealbook_b, beige_book, agenda, statement, or press_conference
artifact_url Full URL to the PDF or HTML artifact
artifact_filename Filename from the URL
artifact_text Plain text from PDF (only when extractPdfText: true)
participants Semicolon-separated participant names from the transcript PRESENT section
chair_name Fed Chair at the time of the meeting
minutes_release_date Date the minutes were publicly released
statement_url Policy statement URL (post-2008 meetings)
press_conference_url Chair press conference URL (post-2011)
canonical_url Year-index source page URL
embargo_status public for all artifacts in the archive
extracted_topics Semicolon-separated topic tags from PDF text (when extractPdfText: true)
scraped_at ISO 8601 timestamp

Notes

  • The 5-year embargo means transcripts are only available for meetings that occurred at least 5 years ago. As of 2026, the archive covers through 2020.
  • Conference call meetings (emergency sessions) are labeled meeting_type: conference_call. They were common during the 2008 financial crisis.
  • PDF text extraction works well for transcripts from 1990 onwards (searchable PDFs). Pre-1990 transcripts may be image-only scans; the extractor returns an empty artifact_text for those rather than failing.
  • All data is public domain (U.S. federal government publication).