OrbTop

ProPublica Nonprofit Crawler - IRS 990 & Tax-Exempt Org Data

BUSINESSLEAD GENERATIONOTHER

ProPublica Nonprofit Crawler — IRS 990 Filings & Tax-Exempt Org Data

Crawl IRS Form 990 filings and tax-exempt organization data from the ProPublica Nonprofit Explorer API. Returns organization identity, 501(c) classification, NTEE codes, multi-year financial history, officer compensation, and PDF links for ~1.8M US nonprofits — the same data source investigative journalists use to follow the money.


ProPublica Nonprofit Crawler Features

  • Searches ProPublica's free Nonprofit Explorer API by name, state, NTEE category, or 501(c) subsection.
  • Extracts one row per (organization, tax year) when full financial data is requested — so year-over-year comparisons work without reshaping.
  • Returns 45+ fields per row including total revenue, functional expenses, net assets, total contributions, program service revenue, officer compensation, and wage breakdowns.
  • Fetches specific organizations by EIN list, with or without dashes.
  • Filters by 501(c) subsection (3, 4, 5, 6, 7, 8, 9, 10, 19) and NTEE top-level category (1-10).
  • Emits the direct ProPublica PDF URL for every 990 filing, structured or scanned.
  • No API key, no proxy, no browser. Pure JSON API with a polite 300ms pace.

Who Uses IRS 990 Data?

  • Grant writers — Screen foundations by NTEE category and asset size before writing another application nobody reads.
  • Donor due diligence — Pull a nonprofit's last five years of financials to check whether the ratio of officer comp to program expenses is reasonable or not.
  • Investigative journalists — Build leads by filtering 501(c)(4) social-welfare orgs in a specific state, or track foundation-to-foundation grants across years.
  • Compliance & KYC teams — Screen nonprofit counterparties against the IRS Business Master File and flag organizations with unusual asset or contribution patterns.
  • Academic researchers — Export a bounded slice of the nonprofit sector (hospitals, foundations, advocacy orgs) for econometric work without wrestling with IRS bulk extracts.
  • Market researchers — Size up a vertical by counting 501(c)(3) orgs with revenue above a threshold in a given geography.

How the ProPublica Nonprofit Crawler Works

  1. Pick a mode: search to paginate filtered results, or organizations to fetch specific EINs you already have.
  2. Set filters — search term, state, NTEE category, 501(c) subsection. Or leave them empty and browse the whole universe one page at a time.
  3. Decide whether you want one row per org (fast) or one row per filing year (richer). The includeFilings toggle controls this.
  4. The crawler paginates ProPublica's API at ~3 requests per second, hydrates each match with the organization detail endpoint when needed, and writes a flat JSON record per row. Dataset is ready to export as CSV, Excel, or JSON.

Input

{
    "mode": "search",
    "searchTerm": "hospital",
    "state": "VT",
    "nteeCategory": "",
    "subsectionCode": "3",
    "einList": [],
    "includeFilings": true,
    "maxItems": 100
}
Field Type Default Description
mode string "search" Either "search" (paginate filtered results) or "organizations" (fetch specific EINs).
searchTerm string "hospital" Free-text query matched against org names. Leave empty to browse all orgs matching the other filters.
state string "" Two-letter US state code (e.g., "CA", "NY"). Search mode only.
nteeCategory string "" NTEE top-level category 1-10 (Arts, Education, Environment, Health, Human Services, International, Public Benefit, Religion, Mutual Benefit, Unclassified). Search mode only.
subsectionCode string "" IRS 501(c) subsection. "3" = charitable, "4" = social welfare, "6" = business leagues, etc. Search mode only.
einList array [] EINs to fetch directly. Accepts "13-1623888" or "131623888". Required when mode is "organizations".
includeFilings boolean true When true, emits one row per (org, tax year) with full 990 financials. When false, one summary row per org.
maxItems integer 100 Hard cap on records returned. Each filing year counts as one record when includeFilings is on.

Organizations mode example — fetch three specific EINs with their full filing history:

{
    "mode": "organizations",
    "einList": ["13-1623888", "53-0196605", "941340523"],
    "includeFilings": true,
    "maxItems": 50
}

ProPublica Nonprofit Crawler Output Fields

Each record is one (organization, tax year) row when includeFilings: true, or one summary row per organization when false. Organization-level fields are repeated across every filing row so downstream joins are trivial.

{
    "ein": "941340523",
    "strein": "94-1340523",
    "name": "Kaiser Foundation Health Plan Inc",
    "careofname": "% KP TAX",
    "address": "ONE KAISER PLAZA",
    "city": "Oakland",
    "state": "CA",
    "zipcode": "94612-3610",
    "ntee_code": "E310",
    "subsection_code": 3,
    "subsection_label": "501(c)(3) - Charitable / religious / educational",
    "classification_codes": "1200",
    "activity_codes": "164000000",
    "foundation_code": 16,
    "deductibility_code": 1,
    "exempt_organization_status_code": 1,
    "organization_code": 1,
    "ruling_date": "1981-12-01",
    "latest_tax_period": "2024-12-01",
    "latest_asset_amount": 33547368863,
    "latest_income_amount": 93006408021,
    "latest_revenue_amount": 82490440881,
    "filing_year": 2023,
    "filing_tax_period": 202312,
    "filing_type": "990",
    "filing_has_data": true,
    "filing_pdf_url": "https://projects.propublica.org/nonprofits/download-filing?path=...",
    "filing_updated": "2025-08-05T16:11:09.202Z",
    "total_revenue": 75101306911,
    "total_functional_expenses": 74356004001,
    "total_assets_end": 31400724759,
    "total_liabilities_end": 22078604890,
    "net_assets_end": 9322119869,
    "total_contributions": 11542682,
    "program_service_revenue": 75068903991,
    "investment_income": 298237561,
    "net_rental_income": 1970907,
    "net_gains_losses": -280418643,
    "compensation_current_officers": 90793859,
    "other_salaries_wages": 2680303412,
    "payroll_taxes": 233744277,
    "professional_fundraising_fees": 0,
    "unrelated_business_income": "Y",
    "data_source": "current_2026_03_10",
    "updated_at": "2026-03-10T23:37:21.272Z",
    "propublica_url": "https://projects.propublica.org/nonprofits/organizations/941340523",
    "scraped_at": "2026-04-19T10:16:31.950Z"
}
Field Type Description
ein string 9-digit Employer Identification Number with leading zeros preserved.
strein string EIN formatted with dash (XX-XXXXXXX).
name string Organization legal name.
sub_name string Alternative name or DBA (from search results).
careofname string Care-of name on the IRS record.
address, city, state, zipcode string Registered address.
ntee_code string NTEE classification (e.g., E200 for hospitals).
subsection_code number IRS 501(c) subsection integer.
subsection_label string Human-readable subsection label.
classification_codes string IRS classification codes.
activity_codes string IRS activity codes.
foundation_code number IRS foundation status code.
deductibility_code number IRS deductibility code.
exempt_organization_status_code number IRS exempt status code (1 = unconditional).
organization_code number IRS organization type (1 = corporation, 2 = trust, etc.).
ruling_date string Date the IRS granted exempt status (YYYY-MM-DD).
latest_tax_period string Most recent tax period on the master file.
latest_asset_amount number Most recent reported total assets (USD).
latest_income_amount number Most recent reported total income (USD).
latest_revenue_amount number Most recent reported total revenue (USD).
filing_year number Calendar year of this filing.
filing_tax_period number Tax period end in YYYYMM format.
filing_type string IRS form (990, 990-EZ, 990-PF).
filing_has_data boolean True when ProPublica parsed structured financial fields; false when only a PDF is available.
filing_pdf_url string Direct link to the 990 PDF.
filing_updated string Last-updated timestamp for the filing record (ISO 8601).
total_revenue number Total revenue on the filing (USD).
total_functional_expenses number Total functional expenses (USD).
total_assets_end number Total assets at year end (USD).
total_liabilities_end number Total liabilities at year end (USD).
net_assets_end number Net assets at year end (USD).
total_contributions number Total contributions, gifts, and grants received (USD).
program_service_revenue number Total program service revenue (USD).
investment_income number Investment income (USD).
net_rental_income number Net rental income (USD).
net_gains_losses number Net gains/losses from asset sales (USD).
compensation_current_officers number Compensation of current officers, directors, trustees, and key employees (USD).
other_salaries_wages number All other salaries and wages (USD).
payroll_taxes number Payroll taxes (USD).
professional_fundraising_fees number Professional fundraising fees (USD).
unrelated_business_income string Y or N — whether the filing reports unrelated business income.
data_source string ProPublica data snapshot label (e.g., current_2026_03_10).
updated_at string Organization record last-updated timestamp.
propublica_url string Link to the ProPublica Nonprofit Explorer page for this org.
scraped_at string Timestamp when this record was produced.

FAQ

How do I scrape IRS 990 data from ProPublica?

ProPublica Nonprofit Crawler wraps the Nonprofit Explorer API v2 and returns structured JSON. Set the mode to search, optionally filter by state, NTEE category, and 501(c) subsection, and run the actor. Output can be exported as CSV, Excel, or JSON from the run dataset.

How much does ProPublica Nonprofit Crawler cost to run?

ProPublica Nonprofit Crawler uses pay-per-event pricing: $0.10 per actor start plus $0.001 per record. A thousand-record pull costs about $1.10. A 100-record preview is around $0.20.

Can I fetch specific nonprofits by EIN?

ProPublica Nonprofit Crawler supports direct EIN lookup through the organizations mode. Paste a list of EINs (with or without dashes) into einList and the actor will fetch each organization and every filing ProPublica has on record.

Does ProPublica Nonprofit Crawler need a proxy or API key?

ProPublica Nonprofit Crawler doesn't need either. The Nonprofit Explorer API is free and unauthenticated, and the actor runs well inside the site's courtesy rate limit with no proxy configuration.

How many years of financial history does it return?

ProPublica Nonprofit Crawler returns every filing ProPublica has on record for each organization — typically 10-15 years of 990, 990-EZ, or 990-PF filings, back to the early 2000s for long-lived orgs. When includeFilings: true, each year is a separate row with full financial fields.

What's the difference between filing_has_data: true and false?

ProPublica Nonprofit Crawler marks a filing filing_has_data: true when ProPublica has parsed structured financial fields from the IRS extract, so you get all the revenue and expense columns populated. When false, only the PDF and basic metadata are available — the IRS has not yet released structured data for that tax year. Recent filings (last 1-2 years) are commonly in this state.


Need More Features?

Need custom fields, a different nonprofit data source, or batch export to your warehouse? File an issue on the actor page or get in touch.

Why Use ProPublica Nonprofit Crawler?

  • Affordable — $0.10 per start plus $0.001 per record. A thousand nonprofits runs about $1.
  • Fresh data — Pulls from ProPublica's live mirror of the IRS Business Master File and Annual Financial Extract, which is how the Nonprofit Explorer website itself gets its numbers. The data_source field tells you exactly which IRS extract the row came from.
  • Clean multi-year shape — One row per (organization, tax year) means year-over-year comparisons and trend analysis work without reshaping the output. Most scrapers hand you a nested blob and wish you luck.