OrbTop

Lobbying Disclosure Scraper - Senate LDA Filings

BUSINESSLEAD GENERATIONOTHER

Senate LDA Lobbying Disclosure Scraper

Extract US federal lobbying disclosure filings from the Senate Lobbying Disclosure Act database. Covers roughly 1.94 million filings from 1999 to present — registrants, clients, lobbyists, issue areas, government entities contacted, foreign entities, and reported income and expenses.

Lobbying Disclosure Crawler Features

  • Extracts every filing field the Senate LDA publishes — registrant contact info, client details, lobbying activities, issue codes, and financials
  • Flattens nested structures into scalar fields and clean string arrays, so you can pipe the output straight into a spreadsheet or database
  • Filters by year, period, filing type, registrant name, client name, client state, issue area code, specific-issue text search, or date-posted range
  • Returns deduplicated lobbyist rosters and government-entity contact lists across all activities in a filing
  • Pure JSON API — no HTML scraping, no browser, no proxies, no authentication
  • Pay-per-event pricing at about $0.001 per filing
  • Honors the Senate's rate conventions with a 250ms courtesy delay between pages

Who Uses Senate LDA Lobbying Data?

  • Political intelligence firms — track new registrations and issue-area trends for clients monitoring specific legislation
  • Compliance teams — screen vendors and counterparties against the canonical federal lobbying registry before contracts close
  • Investigative journalists — follow the money from foreign entities, affiliated orgs, and quarterly expense reports across years
  • Policy researchers — build issue-area datasets spanning decades of filings without paying for commercial aggregators
  • Advocacy and public-interest orgs — surface which firms represent which clients on which issues, with source links back to the original filings

How the Senate LDA Crawler Works

  1. You pick at least one narrowing filter — a filing year, a registrant name, a client name, a state, an issue code, or a date window.
  2. The crawler queries the Senate LDA API at lda.senate.gov/api/v1/filings/, paginating through the standard DRF envelope until it reaches your maxItems cap or the last page.
  3. Each raw filing is flattened: registrant and client objects collapse into scalar fields, lobbying activities turn into a formatted string array, and nested lobbyists get deduplicated across activities.
  4. Results save to your Apify dataset with a stable schema, one row per filing.

Input

Basic: pull 2024 Q1 filings

{
  "filingYear": 2024,
  "filingPeriod": "first_quarter",
  "maxItems": 100
}

Search by client name

{
  "clientName": "GOOGLE",
  "maxItems": 50
}

Registrations only, filtered by year

{
  "filingYear": 2024,
  "filingType": "RR",
  "maxItems": 200
}

Issue-area slice (all Health filings from a given year)

{
  "filingYear": 2024,
  "issueAreaCode": "HCR",
  "maxItems": 500
}

Date window on when filings were posted

{
  "datePostedFrom": "2024-01-01",
  "datePostedTo": "2024-03-31",
  "maxItems": 1000
}

Input Parameters

Field Type Default Description
filingYear integer 2024 Calendar year the filing reports on. Leave empty if using a different narrowing filter. Valid range: 1999-present.
filingPeriod string "" Reporting period. One of first_quarter, second_quarter, third_quarter, fourth_quarter, mid_year, year_end, undetermined, or empty for all periods.
filingType string "" Filing type code. RR = new registration, Q1Q4 = quarterly reports, 1A4A = amendments, 1T4T = terminations. Empty returns all types.
registrantName string "" Case-insensitive contains match on the lobbying firm name.
clientName string "" Case-insensitive contains match on the client organization name.
clientState string "" Two-letter US state code filtering the client's state.
issueAreaCode string "" Three-letter general issue area code (e.g., HCR, BUD, TEC, DEF, TAX).
specificIssueSearch string "" Full-text search on the "specific lobbying issues" description.
datePostedFrom string "" Return filings posted on or after this date (YYYY-MM-DD).
datePostedTo string "" Return filings posted on or before this date (YYYY-MM-DD).
maxItems integer 100 Maximum number of filings to return. The API serves 25 per page, so small values finish in one or two requests.
proxyConfiguration object { useApifyProxy: false } Proxy settings. The Senate LDA API is public and does not require proxies.

At least one narrowing filter is required. Running the full corpus unfiltered is blocked — that would be about 78,000 pages and nobody's day goes well after that.

Lobbying Disclosure Crawler Output Fields

Example Output

{
  "filing_uuid": "467e4a97-6351-4902-8ffa-dd51632e156b",
  "filing_type": "Q1",
  "filing_type_display": "1st Quarter - Report",
  "filing_year": 2024,
  "filing_period": "first_quarter",
  "filing_period_display": "1st Quarter (Jan 1 - Mar 31)",
  "filing_document_url": "https://lda.senate.gov/filings/public/filing/467e4a97-6351-4902-8ffa-dd51632e156b/print/",
  "dt_posted": "2024-01-02T13:14:26-05:00",
  "effective_date": "2023-08-01",
  "termination_date": "",
  "posted_by_name": "Sean Farrell",
  "income": 30000,
  "income_amount": "30000.00",
  "expenses": null,
  "expense_amount": "",
  "expenses_method": "",
  "registrant_id": 401107792,
  "registrant_name": "EAST CAPITOL ADVISORS LLC",
  "registrant_description": "",
  "registrant_address": "921 H Street, NE, #252",
  "registrant_city": "Washington",
  "registrant_state": "District of Columbia",
  "registrant_zip": "20002",
  "registrant_country": "United States of America",
  "registrant_contact_name": "SEAN FARRELL",
  "registrant_contact_phone": "+1 202-944-0520",
  "registrant_house_id": 56170,
  "client_id": 56764,
  "client_name": "CTIA - THE WIRELESS ASSOCIATION",
  "client_description": "CTIA is the trade association of the cellular/wireless industry.",
  "client_state": "District of Columbia",
  "client_country": "United States of America",
  "client_ppb_state": "District of Columbia",
  "client_ppb_country": "United States of America",
  "client_self_select": false,
  "client_is_government_entity": false,
  "lobbyists": [
    "SEAN FARRELL"
  ],
  "lobbying_activities": [
    "TEC - Telecommunications: H.R.3949, End Cells in Cells Act, a bill to increase criminal penalties for contraband cell phones in prisons and jails."
  ],
  "government_entities": [
    "HOUSE OF REPRESENTATIVES"
  ],
  "issue_area_codes": [
    "TEC"
  ],
  "foreign_entities": [],
  "affiliated_organizations": [],
  "conviction_disclosures": [],
  "api_url": "https://lda.senate.gov/api/v1/filings/467e4a97-6351-4902-8ffa-dd51632e156b/",
  "document_url": "https://lda.senate.gov/filings/public/filing/467e4a97-6351-4902-8ffa-dd51632e156b/print/"
}

Output Field Reference

Field Type Description
filing_uuid string Unique UUID of the filing
filing_type string Filing type code (RR, Q1, Q2, Q3, Q4, MM, YE, 1A, 2T, etc.)
filing_type_display string Human-readable filing type (e.g., "Registration", "1st Quarter - Report")
filing_year integer Calendar year the filing reports on
filing_period string Reporting period code
filing_period_display string Human-readable reporting period
filing_document_url string Filer-submitted document URL
dt_posted string When the filing was posted to the LDA database (ISO 8601)
effective_date string Effective date of the client-registrant relationship (ISO 8601)
termination_date string When the relationship was terminated, if applicable
posted_by_name string Name of the person who submitted the filing
income number Reported lobbying income in USD, or null
income_amount string Raw decimal string for precision-sensitive downstream systems
expenses number Reported lobbying expenses in USD, or null
expense_amount string Raw decimal string for expenses
expenses_method string Method used to calculate expenses (a, b, or c)
registrant_id integer Internal ID of the registrant
registrant_name string Registrant (lobbying firm) name
registrant_description string Registrant's self-description of its business
registrant_address string Registrant street address (line 1 + line 2, comma-joined)
registrant_city string Registrant city
registrant_state string Registrant state/region name
registrant_zip string Registrant postal code
registrant_country string Registrant country (display name)
registrant_contact_name string Registrant primary contact name
registrant_contact_phone string Registrant primary contact phone
registrant_house_id integer Registrant's ID in the companion House system, if available
client_id integer Internal ID of the client organization
client_name string Client organization name
client_description string Client's self-description / general business
client_state string Client state/region name
client_country string Client country (display name)
client_ppb_state string Client's principal place of business state
client_ppb_country string Client's principal place of business country
client_self_select boolean True if the client self-registered (vs. represented by a firm)
client_is_government_entity boolean True if the client is itself a government entity
lobbyists string[] Deduplicated list of lobbyists, each as First Last (covered_position) [NEW]
lobbying_activities string[] Formatted activity records: <code> - <display>: <description>
government_entities string[] Deduplicated list of government entities contacted across all activities
issue_area_codes string[] Unique general issue area codes covered by this filing
foreign_entities string[] Foreign entities with a financial interest, with country, ownership, and contribution where available
affiliated_organizations string[] Affiliated organizations that contribute to the lobbying, with city/state/country
conviction_disclosures string[] Conviction disclosures: Name - offense - date
api_url string Absolute URL to this filing's own Senate LDA API record
document_url string Public Senate LDA viewer URL for the filing document

FAQ

How do I scrape Senate lobbying disclosures?

Pick a narrowing filter — filing year, client name, registrant name, issue area code, or a date range — and run the actor. The full Senate LDA API is exposed via standard fields, and the output comes back as flat JSON ready for a dataset, CSV, or database.

How many lobbying filings does the Senate LDA Crawler cover?

About 1.94 million filings from 1999 through today, with roughly 97,000 new filings per year. The actor streams straight from the canonical Senate source, so new filings show up as they're posted.

What filters work on the Senate LDA API?

The actor supports filingYear, filingPeriod, filingType, registrantName, clientName, clientState, issueAreaCode, specificIssueSearch, datePostedFrom, and datePostedTo. Combine them as needed — the API applies AND-semantics. At least one filter is required to prevent accidental full-corpus runs.

Do I need proxies or an API key?

No. The Senate LDA API is public, unauthenticated, and free. Proxies are disabled by default.

How much does it cost to run?

About $0.10 to start plus roughly $0.001 per filing returned. Pulling a thousand filings lands near $1.10. A full year slice (about 97,000 filings) lands near $97.

What's the deal with the lda.gov deprecation notice?

The Senate-hosted API at lda.senate.gov carries a deprecation header naming lda.gov/api/v1/ as the successor, with a sunset of 2026-06-30. The successor host isn't open to the public yet. The crawler uses the working Senate host today, and the migration is a one-line change once the new host goes live.

How fast does the crawler run?

The Senate API returns 25 records per page. Small queries (10–100 filings) complete in under 10 seconds. A thousand filings takes a few minutes with the 250ms polite delay between pages.

Need More Features?

Need custom fields, additional filters, or a different data source? File an issue or get in touch.

Why Use the Senate LDA Lobbying Disclosure Crawler?

  • Canonical source — Reads the Senate's own JSON API, so output tracks whatever the filer reported, not an aggregator's interpretation.
  • Priced per record — About $0.001 per filing. A thousand filings costs a little over a dollar, which is what you might call "reasonable."
  • Clean output schema — Nested activities, lobbyists, and government entities get flattened and deduplicated into scalar fields and string arrays. No post-processing required before you load it into a warehouse.
  • No proxy overhead — Public US government API with no anti-bot measures, so your run cost is just compute and records.