NIH RePORTER Scraper - Grants, PIs & Linked Publications
NIH RePORTER Scraper - Grants, PIs & Linked Publications
Extract NIH-funded research project records from the official RePORTER v2 API — no account or proxy required. Retrieve PI names, award amounts, activity codes, study sections, dates, active/terminated status, and optionally linked PubMed publication IDs.
What you get
Each output record corresponds to one NIH project award (one fiscal-year slice). Fields include:
| Field | Description |
|---|---|
project_num |
Full NIH project number (e.g. 5R01CA123456-05) |
core_project_num |
Core project number — groups subprojects and multi-year awards |
appl_id |
Application ID |
fiscal_year |
NIH fiscal year |
project_title |
Project title |
abstract_text |
Full project abstract |
phr_text |
Public health relevance statement |
activity_code |
NIH activity code (R01, R21, K99, F31, P30, U54, …) |
agency_ic_admin |
Administering institute/center (NCI, NIAID, NHLBI, …) |
award_amount |
Total award amount (USD) |
direct_cost_amt |
Direct costs (USD) |
indirect_cost_amt |
Indirect costs (USD) |
contact_pi_name |
Contact PI name |
principal_investigators |
Full PI roster — each entry is a JSON string with full_name, profile_id, is_contact_pi |
organization_name |
Funded institution |
org_state |
US state of funded institution |
is_active |
Whether the project is currently active |
arra_funded |
Whether funded via ARRA (stimulus) |
budget_start / budget_end |
Budget period dates |
project_start_date / project_end_date |
Project period dates |
full_study_section |
NIH study section that reviewed the application |
agency_ic_fundings |
IC-level funding breakdown (FY:IC:amount strings) |
spending_categories |
NIH spending categories |
linked_publication_pmids |
PubMed IDs of linked publications (when Include Linked Publications is enabled) |
project_detail_url |
Direct link to the RePORTER project-details page |
Filtering options
| Input | Effect |
|---|---|
| Keyword / Text Search | Search across title, abstract, and terms |
| Fiscal Years | Limit to one or more NIH fiscal years (strongly recommended for large pulls) |
| Activity Codes | E.g. R01, R21, K99, F31, P30, U54 |
| Administering Institute | E.g. NCI, NIAID, NHLBI, NIGMS |
| PI Names | Filter by PI last name |
| Organization Names | Filter by funded institution |
| Organization States | Filter by US state (e.g. CA, MA, NY) |
| Active Projects Only | Exclude terminated/closed awards |
| Newly Added Only | Only records recently added to RePORTER |
| Include Linked Publications | Fetch linked PubMed IDs for each project |
| Max Items | Cap on total records returned |
API limits & pagination
The NIH RePORTER v2 API enforces a hard cap of 15,000 rows per search query (offset + page size cannot exceed 15,000). For large pulls, specify one or more Fiscal Years — the scraper runs a separate query per year so each slice stays under the cap. A single fiscal year typically contains 60,000–100,000 awards; the scraper fetches up to 15,000 per year and logs a warning when the cap is reached.
Use cases
- Grant landscape analysis — map NIH funding across institutes, activity codes, and institutions
- PI profiling — identify investigators and their award history
- Policy research — track ARRA, COVID-response, or newly-terminated awards
- Publication pipeline — link grants to downstream PubMed output
- Competitive intelligence — benchmark funding in a specific disease area or geography
Example output
{
"project_num": "5R01CA123456-05",
"core_project_num": "R01CA123456",
"appl_id": 10987654,
"fiscal_year": 2024,
"project_title": "Novel Approaches to Targeted Cancer Therapy",
"activity_code": "R01",
"agency_ic_admin": "NCI",
"award_amount": 512000,
"direct_cost_amt": 350000,
"indirect_cost_amt": 162000,
"contact_pi_name": "DOE, JANE",
"principal_investigators": [
"{\"full_name\":\"Jane Doe\",\"profile_id\":12345,\"is_contact_pi\":true,\"title\":\"Prof.\"}"
],
"organization_name": "STANFORD UNIVERSITY",
"org_state": "CA",
"is_active": true,
"arra_funded": false,
"budget_start": "2024-04-01",
"budget_end": "2025-03-31",
"project_start_date": "2020-04-01",
"project_end_date": "2025-03-31",
"full_study_section": "Tumor Microenvironment Study Section",
"agency_ic_fundings": ["2024:NCI:512000"],
"spending_categories": ["Cancer"],
"linked_publication_pmids": [],
"project_detail_url": "https://reporter.nih.gov/project-details/R01CA123456",
"status": "success"
}
Data source
All data is drawn from the NIH Research Portfolio Online Reporting Tools (RePORTER) — a public database maintained by the National Institutes of Health. No authentication is required. The scraper calls the official v2 REST API and does not require a proxy.