April 17, 2026
Government Data You Didn't Know Was Scrapable: 13 Federal Databases in One Toolkit
The U.S. federal government publishes more structured data than any organization on earth. EPA toxic release inventories, OSHA violation records, FEC campaign donations, OFAC sanctions lists, CPSC product recalls — all publicly available. Most of it is free. Almost none of it is easy to use at scale.
The problem isn't access. It's extraction. Agency APIs have rate limits and query restrictions. Bulk download files mean cleaning gigabytes of inconsistent CSVs. Every database has its own pagination logic, field naming conventions, and output format. Building a reliable pipeline for even one agency is weeks of engineering work.
We built 13 crawlers that handle the extraction, pagination, and normalization for the most valuable federal databases. You get clean JSON records, filtered to what you need, without writing a single API integration.
Compliance & Risk Teams
Four databases cover the compliance stack: sanctions screening, workplace safety, product liability, and environmental risk.
OFAC Sanctions List
The OFAC Sanctions List Crawler extracts sanctioned entities from both the SDN (Specially Designated Nationals) and Consolidated lists maintained by the U.S. Treasury. Every record includes names, aliases, ID documents, nationalities, and designation programs.
Key fields: Entity name, alias list, ID document numbers, address, nationality, designation program, list type (SDN / Consolidated).
Best for: KYC/AML screening. Run before onboarding clients, vendors, or counterparties. Cross-reference against your customer database to flag matches on names, aliases, or identification numbers.
OSHA Inspection Records
The OSHA Inspection Crawler extracts 4M+ workplace safety inspections from the Occupational Safety and Health Administration. Each record includes the inspection type, violation classifications, cited standards, and penalty amounts.
Key fields: Establishment name, inspection date, inspection type, violation type (Serious / Willful / Repeat), cited standard, initial penalty, current penalty, abatement date.
Best for: Workplace safety due diligence. Check any company's inspection history before an acquisition, partnership, or lease. Insurance underwriters use this data to assess risk profiles by industry and geography.
CPSC Product Recalls
The CPSC Product Recall Crawler extracts 6,000+ consumer product recalls from the Consumer Product Safety Commission. Records include hazard descriptions, remedy types, injury and death counts, and UPC codes.
Key fields: Product name, manufacturer, hazard description, remedy type, units affected, injuries reported, deaths reported, UPC codes, recall date.
Best for: Product liability risk assessment. Track recalls by manufacturer, hazard type, or product category. E-commerce platforms use this data to screen supplier catalogs against known recalled products.
EPA Toxic Release Inventory
The EPA Toxic Release Inventory Crawler extracts chemical release data reported to the EPA under the Emergency Planning and Community Right-to-Know Act. Facilities report quantities of toxic chemicals released into air, water, and land — including carcinogen classifications.
Key fields: Facility name, address, chemical name, release medium (air / water / land / underground injection), total release quantity, carcinogen flag, NAICS code, reporting year.
Best for: Environmental compliance and site risk assessment. Real estate due diligence teams use TRI data to evaluate contamination risk before acquisitions. ESG analysts track release trends by company and facility.
Procurement & Government Contractors
Three databases cover the federal contracting lifecycle: finding opportunities, tracking awards, and identifying teaming partners.
SAM.gov Federal Procurement
The SAM.gov Scraper extracts 5.5M+ federal procurement records from the System for Award Management — the central hub for government contracting opportunities. Records span solicitations, pre-solicitations, award notices, sources sought, and special notices.
Key fields: Notice type, title, agency, NAICS code, set-aside type, posted date, response deadline, award amount, awardee, place of performance.
Best for: Finding active contract opportunities. Filter by NAICS code, set-aside type (8(a), HUBZone, SDVOSB, WOSB), or agency to build a targeted pipeline of bids.
USAspending Federal Awards
The USAspending Crawler extracts $6T+ in federal award data — contracts, grants, loans, direct payments, and other financial assistance. This is the most comprehensive view of where federal money goes.
Key fields: Award type, funding agency, awarding agency, recipient name, award amount, period of performance, place of performance, NAICS code, CFDA program, transaction descriptions.
Best for: Competitive intelligence and market sizing. Analyze award patterns by agency, recipient, geography, and fiscal year. Identify which companies win contracts in your NAICS code and how much they're worth.
SBA Small Business Listings
The SBA Crawler extracts 450K+ verified business listings from the Dynamic Small Business Search database. These are businesses registered in SAM.gov with small business certifications.
Key fields: Business name, DUNS number, address, NAICS codes, certification types (8(a), HUBZone, SDVOSB, WOSB), SBA region, point of contact.
Best for: Finding teaming partners and subcontractors. Large primes use this data to identify certified small businesses for set-aside requirements and subcontracting plans.
Journalists & Political Researchers
Three databases cover the intersection of money, regulation, and corporate disclosure.
FEC Campaign Finance
The FEC Campaign Finance Crawler extracts campaign contribution data from the Federal Election Commission. Search by candidate, PAC, individual donor, election year, state, and party.
Key fields: Contributor name, recipient candidate/committee, contribution amount, contribution date, employer, occupation, state, election cycle, party affiliation.
Best for: Following political money. Track individual contributions by employer to map corporate political spending. Compare fundraising patterns across election cycles, states, and party lines.
Federal Register
The Federal Register Crawler extracts 800K+ regulatory documents published in the daily journal of the U.S. government. Document types include final rules, proposed rules, notices, presidential documents, and executive orders.
Key fields: Document type, title, agency, publication date, effective date, comment deadline, abstract, CFR references, docket numbers, RIN.
Best for: Tracking regulatory changes. Set up recurring crawls filtered by agency to monitor new rules that affect your industry. Researchers use comment deadlines to coordinate public input on proposed regulations.
SEC EDGAR Filings
The SEC EDGAR Crawler extracts data from the SEC's EDGAR database — 800K+ companies and 12M+ filings. Covers 10-K annual reports, 10-Q quarterly reports, 8-K current reports, Form 4 insider transactions, and dozens of other filing types.
Key fields: Company name, CIK number, filing type, filing date, reporting period, filer details, document URLs, XBRL data links.
Best for: Financial research and insider trading analysis. Track Form 4 filings to see when executives buy or sell shares. Pull 10-K/10-Q filings in bulk to build financial datasets for analysis.
Security & Vulnerability Teams
NVD CVE Vulnerability Database
The NVD CVE Vulnerability Crawler extracts vulnerability records from NIST's National Vulnerability Database. Each CVE includes structured severity data, affected product identifiers, and remediation status.
Key fields: CVE ID, description, CVSS v3.1 score, severity rating, attack vector, affected products (CPE), exploit availability flag, patch availability flag, published date, last modified date.
Best for: Vulnerability management and security research. Feed CVE data into your asset inventory to match vulnerabilities against deployed software versions. Security teams use CVSS scores and exploit flags to prioritize patching.
Data Engineers & Researchers
Data.gov Dataset Catalog
The Data.gov Dataset Catalog Crawler extracts metadata for 400K+ federal open datasets published across hundreds of agencies. Each record includes the dataset description, download URLs, API endpoints, update frequency, and publisher.
Key fields: Dataset title, description, publisher, tags, format (CSV / JSON / XML / API), download URL, API endpoint, temporal coverage, update frequency, license.
Best for: Dataset discovery. Instead of searching Data.gov one query at a time, pull the entire catalog and filter locally. Data engineers use this to build automated pipelines that monitor for new datasets in specific agencies or topics.
FMCSA Carrier Safety Records
The FMCSA DOT Crawler extracts carrier safety data from the Federal Motor Carrier Safety Administration's SAFER database. Records cover registered carriers, inspection histories, crash data, and compliance reviews.
Key fields: DOT number, legal name, DBA name, physical address, carrier operation type, fleet size, out-of-service rate, inspection count, safety rating, insurance status.
Best for: Logistics and transportation data. Freight brokers use carrier safety records to vet motor carriers before booking loads. Insurance companies use inspection histories and OOS rates to price commercial auto policies.
Comparison Table
| Database | Records | Key Fields | Best For | Crawler |
|---|---|---|---|---|
| OFAC Sanctions | SDN + Consolidated lists | Names, aliases, ID documents, programs | KYC/AML screening | OFAC Sanctions Crawler |
| OSHA Inspections | 4M+ inspections | Violations, penalties, cited standards | Workplace safety due diligence | OSHA Inspection Crawler |
| CPSC Recalls | 6,000+ recalls | Hazards, remedies, UPC codes, injuries | Product liability screening | CPSC Recall Crawler |
| EPA TRI | Facility-level releases | Chemicals, quantities, carcinogen flags | Environmental risk assessment | EPA TRI Crawler |
| SAM.gov | 5.5M+ records | Solicitations, awards, NAICS, set-asides | Finding contract opportunities | SAM.gov Scraper |
| USAspending | $6T+ in awards | Contracts, grants, recipients, agencies | Competitive intelligence | USAspending Crawler |
| SBA | 450K+ businesses | Certifications, NAICS, contacts | Teaming partner discovery | SBA Crawler |
| FEC | Contributions by cycle | Donors, amounts, employers, parties | Political money tracking | FEC Campaign Finance Crawler |
| Federal Register | 800K+ documents | Rules, proposed rules, comment deadlines | Regulatory monitoring | Federal Register Crawler |
| SEC EDGAR | 12M+ filings | 10-K, 10-Q, 8-K, Form 4, XBRL | Financial research | SEC EDGAR Crawler |
| NVD CVE | Active CVE records | CVSS scores, CPE, exploit/patch flags | Vulnerability management | NVD CVE Crawler |
| Data.gov | 400K+ datasets | Metadata, download URLs, API endpoints | Dataset discovery | Data.gov Crawler |
| FMCSA | Carrier records | DOT numbers, safety ratings, inspections | Carrier vetting | FMCSA DOT Crawler |
How to Get Started
- Pick a database from the table above.
- Open the crawler on Apify — click "Try it on Apify" on any crawler page.
- Set your filters — agency, date range, geographic area, or keyword depending on the database.
- Run a test — start with 50–100 records to validate the output format and fields.
- Scale up — increase limits and schedule recurring runs to keep your data current.
All 13 crawlers run on Apify's pay-per-event pricing — no subscriptions, no annual contracts. Data is returned as clean JSON, ready to feed into your database, CRM, or analysis pipeline.
Browse the full catalog to see every scraper and crawler we publish.