OrbTop

US Congressional Record Scraper

BUSINESSEDUCATIONOTHER

US Congressional Record Scraper

Extracts daily floor speeches and statements from the US Congressional Record via the official congress.gov API. Returns per-article records with metadata and optional full text across all CR sections (Daily Digest, Senate, House, Extensions of Remarks). Coverage spans from 1995 to present.

What You Get

Each record represents a single article (speech, statement, or section entry) from a daily Congressional Record issue:

Field Description
congress Congress number (e.g. 119)
session Session number within the Congress
volume CR volume number
issue_number Issue number within the volume
issue_date Publication date (YYYY-MM-DD)
section Section name: Daily Digest, Senate Section, House Section, Extensions of Remarks Section
article_title Title of the article or speech
start_page First page in the printed Record
end_page Last page in the printed Record
article_text Full plain-text body (populated when includeFullText=true)
pdf_url URL to the PDF version of this article
source_url Canonical URL on congress.gov
scraped_at ISO-8601 scrape timestamp

Input Configuration

Field Type Default Description
maxItems integer 15 Maximum number of article records to return
dateFrom string Start date for issues to fetch (YYYY-MM-DD)
dateTo string End date for issues to fetch (YYYY-MM-DD)
congress integer Filter by Congress number (e.g. 119 for the 119th Congress)
includeFullText boolean true Fetch the full plain-text body of each article
apiKey string Your api.congress.gov API key (free at api.congress.gov/sign-up). If blank, a shared key is used.

Getting an API Key

The congress.gov API is free and open. Get a key at api.congress.gov/sign-up — instant issuance, no review required. A shared key is provided when none is supplied, subject to shared rate limits (5000 requests/hour per key).

Usage Examples

Fetch today's Congressional Record (metadata only):

{
  "dateFrom": "2026-05-15",
  "dateTo": "2026-05-15",
  "includeFullText": false,
  "maxItems": 100
}

Download full text from a specific Congress:

{
  "congress": 119,
  "dateFrom": "2026-01-01",
  "dateTo": "2026-03-31",
  "includeFullText": true,
  "maxItems": 1000
}

Backfill historical speeches:

{
  "dateFrom": "1995-01-04",
  "dateTo": "1995-12-31",
  "includeFullText": true,
  "maxItems": 5000
}

Notes

  • Text coverage: Only articles with an associated Formatted Text URL include article text. PDF-only records (common pre-2000) will have article_text: null even with includeFullText: true.
  • Text size cap: article_text is capped at 50,000 characters per article.
  • Rate limit: With includeFullText: true, the scraper makes 2–3 API/HTTP requests per article. The default delay (400ms between requests) keeps throughput well under the 5000 req/hr limit.
  • Pre-1995 records: Prior to 1995, digitized text is limited. The API will return issue metadata but article text may be unavailable.