Lobbying Disclosure Scraper - Senate LDA Filings
Senate LDA Lobbying Disclosure Scraper
Extract US federal lobbying disclosure filings from the Senate Lobbying Disclosure Act database. Covers roughly 1.94 million filings from 1999 to present — registrants, clients, lobbyists, issue areas, government entities contacted, foreign entities, and reported income and expenses.
Lobbying Disclosure Crawler Features
- Extracts every filing field the Senate LDA publishes — registrant contact info, client details, lobbying activities, issue codes, and financials
- Flattens nested structures into scalar fields and clean string arrays, so you can pipe the output straight into a spreadsheet or database
- Filters by year, period, filing type, registrant name, client name, client state, issue area code, specific-issue text search, or date-posted range
- Returns deduplicated lobbyist rosters and government-entity contact lists across all activities in a filing
- Pure JSON API — no HTML scraping, no browser, no proxies, no authentication
- Pay-per-event pricing at about $0.001 per filing
- Honors the Senate's rate conventions with a 250ms courtesy delay between pages
Who Uses Senate LDA Lobbying Data?
- Political intelligence firms — track new registrations and issue-area trends for clients monitoring specific legislation
- Compliance teams — screen vendors and counterparties against the canonical federal lobbying registry before contracts close
- Investigative journalists — follow the money from foreign entities, affiliated orgs, and quarterly expense reports across years
- Policy researchers — build issue-area datasets spanning decades of filings without paying for commercial aggregators
- Advocacy and public-interest orgs — surface which firms represent which clients on which issues, with source links back to the original filings
How the Senate LDA Crawler Works
- You pick at least one narrowing filter — a filing year, a registrant name, a client name, a state, an issue code, or a date window.
- The crawler queries the Senate LDA API at
lda.senate.gov/api/v1/filings/, paginating through the standard DRF envelope until it reaches yourmaxItemscap or the last page. - Each raw filing is flattened: registrant and client objects collapse into scalar fields, lobbying activities turn into a formatted string array, and nested lobbyists get deduplicated across activities.
- Results save to your Apify dataset with a stable schema, one row per filing.
Input
Basic: pull 2024 Q1 filings
{
"filingYear": 2024,
"filingPeriod": "first_quarter",
"maxItems": 100
}
Search by client name
{
"clientName": "GOOGLE",
"maxItems": 50
}
Registrations only, filtered by year
{
"filingYear": 2024,
"filingType": "RR",
"maxItems": 200
}
Issue-area slice (all Health filings from a given year)
{
"filingYear": 2024,
"issueAreaCode": "HCR",
"maxItems": 500
}
Date window on when filings were posted
{
"datePostedFrom": "2024-01-01",
"datePostedTo": "2024-03-31",
"maxItems": 1000
}
Input Parameters
| Field | Type | Default | Description |
|---|---|---|---|
| filingYear | integer | 2024 |
Calendar year the filing reports on. Leave empty if using a different narrowing filter. Valid range: 1999-present. |
| filingPeriod | string | "" |
Reporting period. One of first_quarter, second_quarter, third_quarter, fourth_quarter, mid_year, year_end, undetermined, or empty for all periods. |
| filingType | string | "" |
Filing type code. RR = new registration, Q1–Q4 = quarterly reports, 1A–4A = amendments, 1T–4T = terminations. Empty returns all types. |
| registrantName | string | "" |
Case-insensitive contains match on the lobbying firm name. |
| clientName | string | "" |
Case-insensitive contains match on the client organization name. |
| clientState | string | "" |
Two-letter US state code filtering the client's state. |
| issueAreaCode | string | "" |
Three-letter general issue area code (e.g., HCR, BUD, TEC, DEF, TAX). |
| specificIssueSearch | string | "" |
Full-text search on the "specific lobbying issues" description. |
| datePostedFrom | string | "" |
Return filings posted on or after this date (YYYY-MM-DD). |
| datePostedTo | string | "" |
Return filings posted on or before this date (YYYY-MM-DD). |
| maxItems | integer | 100 |
Maximum number of filings to return. The API serves 25 per page, so small values finish in one or two requests. |
| proxyConfiguration | object | { useApifyProxy: false } |
Proxy settings. The Senate LDA API is public and does not require proxies. |
At least one narrowing filter is required. Running the full corpus unfiltered is blocked — that would be about 78,000 pages and nobody's day goes well after that.
Lobbying Disclosure Crawler Output Fields
Example Output
{
"filing_uuid": "467e4a97-6351-4902-8ffa-dd51632e156b",
"filing_type": "Q1",
"filing_type_display": "1st Quarter - Report",
"filing_year": 2024,
"filing_period": "first_quarter",
"filing_period_display": "1st Quarter (Jan 1 - Mar 31)",
"filing_document_url": "https://lda.senate.gov/filings/public/filing/467e4a97-6351-4902-8ffa-dd51632e156b/print/",
"dt_posted": "2024-01-02T13:14:26-05:00",
"effective_date": "2023-08-01",
"termination_date": "",
"posted_by_name": "Sean Farrell",
"income": 30000,
"income_amount": "30000.00",
"expenses": null,
"expense_amount": "",
"expenses_method": "",
"registrant_id": 401107792,
"registrant_name": "EAST CAPITOL ADVISORS LLC",
"registrant_description": "",
"registrant_address": "921 H Street, NE, #252",
"registrant_city": "Washington",
"registrant_state": "District of Columbia",
"registrant_zip": "20002",
"registrant_country": "United States of America",
"registrant_contact_name": "SEAN FARRELL",
"registrant_contact_phone": "+1 202-944-0520",
"registrant_house_id": 56170,
"client_id": 56764,
"client_name": "CTIA - THE WIRELESS ASSOCIATION",
"client_description": "CTIA is the trade association of the cellular/wireless industry.",
"client_state": "District of Columbia",
"client_country": "United States of America",
"client_ppb_state": "District of Columbia",
"client_ppb_country": "United States of America",
"client_self_select": false,
"client_is_government_entity": false,
"lobbyists": [
"SEAN FARRELL"
],
"lobbying_activities": [
"TEC - Telecommunications: H.R.3949, End Cells in Cells Act, a bill to increase criminal penalties for contraband cell phones in prisons and jails."
],
"government_entities": [
"HOUSE OF REPRESENTATIVES"
],
"issue_area_codes": [
"TEC"
],
"foreign_entities": [],
"affiliated_organizations": [],
"conviction_disclosures": [],
"api_url": "https://lda.senate.gov/api/v1/filings/467e4a97-6351-4902-8ffa-dd51632e156b/",
"document_url": "https://lda.senate.gov/filings/public/filing/467e4a97-6351-4902-8ffa-dd51632e156b/print/"
}
Output Field Reference
| Field | Type | Description |
|---|---|---|
| filing_uuid | string | Unique UUID of the filing |
| filing_type | string | Filing type code (RR, Q1, Q2, Q3, Q4, MM, YE, 1A, 2T, etc.) |
| filing_type_display | string | Human-readable filing type (e.g., "Registration", "1st Quarter - Report") |
| filing_year | integer | Calendar year the filing reports on |
| filing_period | string | Reporting period code |
| filing_period_display | string | Human-readable reporting period |
| filing_document_url | string | Filer-submitted document URL |
| dt_posted | string | When the filing was posted to the LDA database (ISO 8601) |
| effective_date | string | Effective date of the client-registrant relationship (ISO 8601) |
| termination_date | string | When the relationship was terminated, if applicable |
| posted_by_name | string | Name of the person who submitted the filing |
| income | number | Reported lobbying income in USD, or null |
| income_amount | string | Raw decimal string for precision-sensitive downstream systems |
| expenses | number | Reported lobbying expenses in USD, or null |
| expense_amount | string | Raw decimal string for expenses |
| expenses_method | string | Method used to calculate expenses (a, b, or c) |
| registrant_id | integer | Internal ID of the registrant |
| registrant_name | string | Registrant (lobbying firm) name |
| registrant_description | string | Registrant's self-description of its business |
| registrant_address | string | Registrant street address (line 1 + line 2, comma-joined) |
| registrant_city | string | Registrant city |
| registrant_state | string | Registrant state/region name |
| registrant_zip | string | Registrant postal code |
| registrant_country | string | Registrant country (display name) |
| registrant_contact_name | string | Registrant primary contact name |
| registrant_contact_phone | string | Registrant primary contact phone |
| registrant_house_id | integer | Registrant's ID in the companion House system, if available |
| client_id | integer | Internal ID of the client organization |
| client_name | string | Client organization name |
| client_description | string | Client's self-description / general business |
| client_state | string | Client state/region name |
| client_country | string | Client country (display name) |
| client_ppb_state | string | Client's principal place of business state |
| client_ppb_country | string | Client's principal place of business country |
| client_self_select | boolean | True if the client self-registered (vs. represented by a firm) |
| client_is_government_entity | boolean | True if the client is itself a government entity |
| lobbyists | string[] | Deduplicated list of lobbyists, each as First Last (covered_position) [NEW] |
| lobbying_activities | string[] | Formatted activity records: <code> - <display>: <description> |
| government_entities | string[] | Deduplicated list of government entities contacted across all activities |
| issue_area_codes | string[] | Unique general issue area codes covered by this filing |
| foreign_entities | string[] | Foreign entities with a financial interest, with country, ownership, and contribution where available |
| affiliated_organizations | string[] | Affiliated organizations that contribute to the lobbying, with city/state/country |
| conviction_disclosures | string[] | Conviction disclosures: Name - offense - date |
| api_url | string | Absolute URL to this filing's own Senate LDA API record |
| document_url | string | Public Senate LDA viewer URL for the filing document |
FAQ
How do I scrape Senate lobbying disclosures?
Pick a narrowing filter — filing year, client name, registrant name, issue area code, or a date range — and run the actor. The full Senate LDA API is exposed via standard fields, and the output comes back as flat JSON ready for a dataset, CSV, or database.
How many lobbying filings does the Senate LDA Crawler cover?
About 1.94 million filings from 1999 through today, with roughly 97,000 new filings per year. The actor streams straight from the canonical Senate source, so new filings show up as they're posted.
What filters work on the Senate LDA API?
The actor supports filingYear, filingPeriod, filingType, registrantName, clientName, clientState, issueAreaCode, specificIssueSearch, datePostedFrom, and datePostedTo. Combine them as needed — the API applies AND-semantics. At least one filter is required to prevent accidental full-corpus runs.
Do I need proxies or an API key?
No. The Senate LDA API is public, unauthenticated, and free. Proxies are disabled by default.
How much does it cost to run?
About $0.10 to start plus roughly $0.001 per filing returned. Pulling a thousand filings lands near $1.10. A full year slice (about 97,000 filings) lands near $97.
What's the deal with the lda.gov deprecation notice?
The Senate-hosted API at lda.senate.gov carries a deprecation header naming lda.gov/api/v1/ as the successor, with a sunset of 2026-06-30. The successor host isn't open to the public yet. The crawler uses the working Senate host today, and the migration is a one-line change once the new host goes live.
How fast does the crawler run?
The Senate API returns 25 records per page. Small queries (10–100 filings) complete in under 10 seconds. A thousand filings takes a few minutes with the 250ms polite delay between pages.
Need More Features?
Need custom fields, additional filters, or a different data source? File an issue or get in touch.
Why Use the Senate LDA Lobbying Disclosure Crawler?
- Canonical source — Reads the Senate's own JSON API, so output tracks whatever the filer reported, not an aggregator's interpretation.
- Priced per record — About $0.001 per filing. A thousand filings costs a little over a dollar, which is what you might call "reasonable."
- Clean output schema — Nested activities, lobbyists, and government entities get flattened and deduplicated into scalar fields and string arrays. No post-processing required before you load it into a warehouse.
- No proxy overhead — Public US government API with no anti-bot measures, so your run cost is just compute and records.