OrbTop

April 17, 2026

Government Data You Didn't Know Was Scrapable: 13 Federal Databases in One Toolkit

The U.S. federal government publishes more structured data than any organization on earth. EPA toxic release inventories, OSHA violation records, FEC campaign donations, OFAC sanctions lists, CPSC product recalls — all publicly available. Most of it is free. Almost none of it is easy to use at scale.

The problem isn't access. It's extraction. Agency APIs have rate limits and query restrictions. Bulk download files mean cleaning gigabytes of inconsistent CSVs. Every database has its own pagination logic, field naming conventions, and output format. Building a reliable pipeline for even one agency is weeks of engineering work.

We built 13 crawlers that handle the extraction, pagination, and normalization for the most valuable federal databases. You get clean JSON records, filtered to what you need, without writing a single API integration.

Compliance & Risk Teams

Four databases cover the compliance stack: sanctions screening, workplace safety, product liability, and environmental risk.

OFAC Sanctions List

The OFAC Sanctions List Crawler extracts sanctioned entities from both the SDN (Specially Designated Nationals) and Consolidated lists maintained by the U.S. Treasury. Every record includes names, aliases, ID documents, nationalities, and designation programs.

Key fields: Entity name, alias list, ID document numbers, address, nationality, designation program, list type (SDN / Consolidated).

Best for: KYC/AML screening. Run before onboarding clients, vendors, or counterparties. Cross-reference against your customer database to flag matches on names, aliases, or identification numbers.

OSHA Inspection Records

The OSHA Inspection Crawler extracts 4M+ workplace safety inspections from the Occupational Safety and Health Administration. Each record includes the inspection type, violation classifications, cited standards, and penalty amounts.

Key fields: Establishment name, inspection date, inspection type, violation type (Serious / Willful / Repeat), cited standard, initial penalty, current penalty, abatement date.

Best for: Workplace safety due diligence. Check any company's inspection history before an acquisition, partnership, or lease. Insurance underwriters use this data to assess risk profiles by industry and geography.

CPSC Product Recalls

The CPSC Product Recall Crawler extracts 6,000+ consumer product recalls from the Consumer Product Safety Commission. Records include hazard descriptions, remedy types, injury and death counts, and UPC codes.

Key fields: Product name, manufacturer, hazard description, remedy type, units affected, injuries reported, deaths reported, UPC codes, recall date.

Best for: Product liability risk assessment. Track recalls by manufacturer, hazard type, or product category. E-commerce platforms use this data to screen supplier catalogs against known recalled products.

EPA Toxic Release Inventory

The EPA Toxic Release Inventory Crawler extracts chemical release data reported to the EPA under the Emergency Planning and Community Right-to-Know Act. Facilities report quantities of toxic chemicals released into air, water, and land — including carcinogen classifications.

Key fields: Facility name, address, chemical name, release medium (air / water / land / underground injection), total release quantity, carcinogen flag, NAICS code, reporting year.

Best for: Environmental compliance and site risk assessment. Real estate due diligence teams use TRI data to evaluate contamination risk before acquisitions. ESG analysts track release trends by company and facility.

Procurement & Government Contractors

Three databases cover the federal contracting lifecycle: finding opportunities, tracking awards, and identifying teaming partners.

SAM.gov Federal Procurement

The SAM.gov Scraper extracts 5.5M+ federal procurement records from the System for Award Management — the central hub for government contracting opportunities. Records span solicitations, pre-solicitations, award notices, sources sought, and special notices.

Key fields: Notice type, title, agency, NAICS code, set-aside type, posted date, response deadline, award amount, awardee, place of performance.

Best for: Finding active contract opportunities. Filter by NAICS code, set-aside type (8(a), HUBZone, SDVOSB, WOSB), or agency to build a targeted pipeline of bids.

USAspending Federal Awards

The USAspending Crawler extracts $6T+ in federal award data — contracts, grants, loans, direct payments, and other financial assistance. This is the most comprehensive view of where federal money goes.

Key fields: Award type, funding agency, awarding agency, recipient name, award amount, period of performance, place of performance, NAICS code, CFDA program, transaction descriptions.

Best for: Competitive intelligence and market sizing. Analyze award patterns by agency, recipient, geography, and fiscal year. Identify which companies win contracts in your NAICS code and how much they're worth.

SBA Small Business Listings

The SBA Crawler extracts 450K+ verified business listings from the Dynamic Small Business Search database. These are businesses registered in SAM.gov with small business certifications.

Key fields: Business name, DUNS number, address, NAICS codes, certification types (8(a), HUBZone, SDVOSB, WOSB), SBA region, point of contact.

Best for: Finding teaming partners and subcontractors. Large primes use this data to identify certified small businesses for set-aside requirements and subcontracting plans.

Journalists & Political Researchers

Three databases cover the intersection of money, regulation, and corporate disclosure.

FEC Campaign Finance

The FEC Campaign Finance Crawler extracts campaign contribution data from the Federal Election Commission. Search by candidate, PAC, individual donor, election year, state, and party.

Key fields: Contributor name, recipient candidate/committee, contribution amount, contribution date, employer, occupation, state, election cycle, party affiliation.

Best for: Following political money. Track individual contributions by employer to map corporate political spending. Compare fundraising patterns across election cycles, states, and party lines.

Federal Register

The Federal Register Crawler extracts 800K+ regulatory documents published in the daily journal of the U.S. government. Document types include final rules, proposed rules, notices, presidential documents, and executive orders.

Key fields: Document type, title, agency, publication date, effective date, comment deadline, abstract, CFR references, docket numbers, RIN.

Best for: Tracking regulatory changes. Set up recurring crawls filtered by agency to monitor new rules that affect your industry. Researchers use comment deadlines to coordinate public input on proposed regulations.

SEC EDGAR Filings

The SEC EDGAR Crawler extracts data from the SEC's EDGAR database — 800K+ companies and 12M+ filings. Covers 10-K annual reports, 10-Q quarterly reports, 8-K current reports, Form 4 insider transactions, and dozens of other filing types.

Key fields: Company name, CIK number, filing type, filing date, reporting period, filer details, document URLs, XBRL data links.

Best for: Financial research and insider trading analysis. Track Form 4 filings to see when executives buy or sell shares. Pull 10-K/10-Q filings in bulk to build financial datasets for analysis.

Security & Vulnerability Teams

NVD CVE Vulnerability Database

The NVD CVE Vulnerability Crawler extracts vulnerability records from NIST's National Vulnerability Database. Each CVE includes structured severity data, affected product identifiers, and remediation status.

Key fields: CVE ID, description, CVSS v3.1 score, severity rating, attack vector, affected products (CPE), exploit availability flag, patch availability flag, published date, last modified date.

Best for: Vulnerability management and security research. Feed CVE data into your asset inventory to match vulnerabilities against deployed software versions. Security teams use CVSS scores and exploit flags to prioritize patching.

Data Engineers & Researchers

Data.gov Dataset Catalog

The Data.gov Dataset Catalog Crawler extracts metadata for 400K+ federal open datasets published across hundreds of agencies. Each record includes the dataset description, download URLs, API endpoints, update frequency, and publisher.

Key fields: Dataset title, description, publisher, tags, format (CSV / JSON / XML / API), download URL, API endpoint, temporal coverage, update frequency, license.

Best for: Dataset discovery. Instead of searching Data.gov one query at a time, pull the entire catalog and filter locally. Data engineers use this to build automated pipelines that monitor for new datasets in specific agencies or topics.

FMCSA Carrier Safety Records

The FMCSA DOT Crawler extracts carrier safety data from the Federal Motor Carrier Safety Administration's SAFER database. Records cover registered carriers, inspection histories, crash data, and compliance reviews.

Key fields: DOT number, legal name, DBA name, physical address, carrier operation type, fleet size, out-of-service rate, inspection count, safety rating, insurance status.

Best for: Logistics and transportation data. Freight brokers use carrier safety records to vet motor carriers before booking loads. Insurance companies use inspection histories and OOS rates to price commercial auto policies.

Comparison Table

Database Records Key Fields Best For Crawler
OFAC Sanctions SDN + Consolidated lists Names, aliases, ID documents, programs KYC/AML screening OFAC Sanctions Crawler
OSHA Inspections 4M+ inspections Violations, penalties, cited standards Workplace safety due diligence OSHA Inspection Crawler
CPSC Recalls 6,000+ recalls Hazards, remedies, UPC codes, injuries Product liability screening CPSC Recall Crawler
EPA TRI Facility-level releases Chemicals, quantities, carcinogen flags Environmental risk assessment EPA TRI Crawler
SAM.gov 5.5M+ records Solicitations, awards, NAICS, set-asides Finding contract opportunities SAM.gov Scraper
USAspending $6T+ in awards Contracts, grants, recipients, agencies Competitive intelligence USAspending Crawler
SBA 450K+ businesses Certifications, NAICS, contacts Teaming partner discovery SBA Crawler
FEC Contributions by cycle Donors, amounts, employers, parties Political money tracking FEC Campaign Finance Crawler
Federal Register 800K+ documents Rules, proposed rules, comment deadlines Regulatory monitoring Federal Register Crawler
SEC EDGAR 12M+ filings 10-K, 10-Q, 8-K, Form 4, XBRL Financial research SEC EDGAR Crawler
NVD CVE Active CVE records CVSS scores, CPE, exploit/patch flags Vulnerability management NVD CVE Crawler
Data.gov 400K+ datasets Metadata, download URLs, API endpoints Dataset discovery Data.gov Crawler
FMCSA Carrier records DOT numbers, safety ratings, inspections Carrier vetting FMCSA DOT Crawler

How to Get Started

  1. Pick a database from the table above.
  2. Open the crawler on Apify — click "Try it on Apify" on any crawler page.
  3. Set your filters — agency, date range, geographic area, or keyword depending on the database.
  4. Run a test — start with 50–100 records to validate the output format and fields.
  5. Scale up — increase limits and schedule recurring runs to keep your data current.

All 13 crawlers run on Apify's pay-per-event pricing — no subscriptions, no annual contracts. Data is returned as clean JSON, ready to feed into your database, CRM, or analysis pipeline.

Browse the full catalog to see every scraper and crawler we publish.