US Congressional Record Scraper - Floor Speeches, Statements & Legislative Text

Extract daily floor speeches and statements from the US Congressional Record via the official Congress.gov API. Returns per-article records with section, volume, issue metadata, page numbers, and optional full plain text. Coverage spans from 1995 to the present across all CR sections: Daily Digest, Senate, House, and Extensions of Remarks.

What data does it extract?

Each record represents a single article from a daily Congressional Record issue:

Field	Description
`congress`	Congress number (e.g. 119)
`session`	Session number within the Congress
`volume`	CR volume number
`issue_number`	Issue number within the volume
`issue_date`	Publication date (YYYY-MM-DD)
`section`	Section name: Daily Digest, Senate Section, House Section, Extensions of Remarks Section
`article_title`	Title of the article or speech
`start_page`	First page in the printed Record
`end_page`	Last page in the printed Record
`article_text`	Full plain-text body (populated when `includeFullText: true`). Capped at 50,000 characters.
`pdf_url`	URL to the PDF version of this article
`source_url`	Canonical URL on congress.gov
`scraped_at`	ISO-8601 scrape timestamp

Text availability note: article_text is populated only for articles that have an associated Formatted Text URL in the API. For most pre-2000 records, only PDF versions exist — article_text will be null even with includeFullText: true. Full text coverage is reliable from approximately 2000 onward.

What does the scraper do?

It calls the Congress.gov API to list CR issues by date or Congress number, then fetches each article's detail record. With includeFullText: true, it makes one additional HTTP request per article to retrieve the plain text body. The default request delay (400 ms between requests) keeps throughput well within the 5,000 req/hr limit on a free API key.

How to use it

Field	Type	Default	Description
`maxItems`	integer	15	Maximum number of article records to return
`dateFrom`	string	—	Start date (YYYY-MM-DD)
`dateTo`	string	—	End date (YYYY-MM-DD)
`congress`	integer	—	Filter by Congress number (e.g. 119 for the 119th Congress, 2025-2027)
`includeFullText`	boolean	`true`	Fetch the full plain-text body of each article
`apiKey`	string	—	Your free api.congress.gov API key. If blank, a shared key is used (shared rate limit).

Get a free API key at api.congress.gov/sign-up/ — instant issuance, no review required. A production key allows 5,000 req/hr.

Fetch recent articles (metadata only)

{
  "dateFrom": "2026-05-15",
  "dateTo": "2026-05-15",
  "includeFullText": false,
  "maxItems": 100
}

Download full text from a specific Congress

{
  "congress": 119,
  "dateFrom": "2026-01-01",
  "dateTo": "2026-03-31",
  "includeFullText": true,
  "maxItems": 1000
}

Historical backfill

{
  "dateFrom": "1995-01-04",
  "dateTo": "1995-12-31",
  "includeFullText": true,
  "maxItems": 5000
}

Use cases

Legislative NLP training corpus — collect floor speeches with section labels and date metadata to train or fine-tune models on legislative language
Lobbying analytics — track which bills are discussed on the House or Senate floor, correlated with sponsor party and subject
Committee hearing archive — build a searchable archive of Extensions of Remarks statements for specific policy areas
Legislative monitoring — run daily with dateFrom set to the previous day to ingest each new CR issue as it publishes
Journalism and data reporting — export historical floor speech text for topic modeling or legislator activity analysis

FAQ

Is the Congress.gov API free? Yes. A free key allows 5,000 req/hr. Without a key, a shared demo key is used, which is limited to 30 req/hr and suitable for testing only.

Why is article_text null for some records? Articles that exist only as PDFs in the Congress.gov system do not have a Formatted Text URL. This is most common for records before 2000. The PDF link is always included in pdf_url.

What is a typical run size? Each daily CR issue contains roughly 100 articles across all four sections. A full month of articles is approximately 2,000 records.

Output is available in JSON, CSV, and Excel via the Apify dataset export panel.

US Congressional Record Scraper