OrbTop

ClinicalTrials.gov Study Crawler

BUSINESSOTHERLEAD GENERATION

ClinicalTrials.gov Clinical Study Data Crawler

Extract structured clinical trial records from ClinicalTrials.gov via the official v2 REST API. The database covers 500K+ studies — conditions, interventions, sponsors, phases, enrollment figures, eligibility criteria, outcomes, and study locations with contact details.

ClinicalTrials.gov Crawler Features

  • Filters by condition or disease, intervention name, lead sponsor, trial phase, study status, study type, and general keyword — combine any of them
  • Fetches 1,000 records per API call using cursor-based pagination, so large result sets do not require hundreds of round trips
  • Covers all 8 study statuses: RECRUITING, COMPLETED, ACTIVE_NOT_RECRUITING, SUSPENDED, TERMINATED, and more
  • Covers all trial phases from Early Phase 1 through Phase 4, plus Not Applicable
  • Extracts 25+ fields per study including full eligibility criteria text and per-location contact information
  • Queries the official v2 JSON API — no HTML parsing, no fragile selectors
  • Requires no authentication and no proxy — ClinicalTrials.gov is a U.S. government service
  • Rate-limited to ~7.7 requests per second, comfortably under the documented 10/sec ceiling

Who Uses ClinicalTrials.gov Data and Why?

  • Pharma and biotech researchers — track competitor trials, map pipeline activity by indication, and monitor phase progression across therapeutic areas
  • Clinical research organizations — identify actively recruiting trials by condition and geography to support site selection and patient referral
  • Investment analysts — map development pipelines for biotech companies by pulling every active or completed study tied to a specific sponsor
  • Patient advocates — find open recruiting studies for a given condition, filtered by geography and eligibility parameters
  • Academic epidemiologists — analyze enrollment trends, study design patterns, and outcome measures across thousands of trials at once

How ClinicalTrials.gov Crawler Works

  1. You provide at least one filter: a condition name, an intervention, a sponsor, a phase, a status, a study type, or a free-text keyword. Combining multiple filters is supported.
  2. The crawler builds a query against the ClinicalTrials.gov v2 API and fetches the first page of up to 1,000 results.
  3. It follows the nextPageToken cursor through subsequent pages until it reaches your maxItems limit or exhausts the result set.
  4. Each API response is transformed into a flat, structured record and saved to the Apify dataset.

Input

Basic: recruiting breast cancer trials in Phase 3

{
    "condition": "breast cancer",
    "phase": "PHASE3",
    "studyStatus": ["RECRUITING"],
    "maxItems": 500
}

Sponsor pipeline lookup

{
    "sponsor": "Pfizer",
    "studyType": "INTERVENTIONAL",
    "maxItems": 200
}

Intervention-specific search

{
    "intervention": "pembrolizumab",
    "studyStatus": ["COMPLETED"],
    "maxItems": 100
}

Input Parameters

Field Type Default Description
condition string "" Condition or disease being studied (e.g. "breast cancer", "Alzheimer", "Type 2 Diabetes").
intervention string "" Intervention name (e.g. "pembrolizumab", "radiation therapy"). Matches drug names, devices, and procedures.
sponsor string "" Lead sponsor name (e.g. "Pfizer", "National Cancer Institute"). Partial match supported.
phase string "" Trial phase. Options: EARLY_PHASE1, PHASE1, PHASE2, PHASE3, PHASE4, NA. Leave empty for all phases.
studyStatus string[] [] One or more statuses: RECRUITING, ACTIVE_NOT_RECRUITING, COMPLETED, ENROLLING_BY_INVITATION, NOT_YET_RECRUITING, SUSPENDED, TERMINATED, WITHDRAWN.
studyType string "" Study type. Options: INTERVENTIONAL, OBSERVATIONAL, EXPANDED_ACCESS. Leave empty for all.
keyword string "" General keyword search across all study fields. Use for broad or exploratory queries.
maxItems integer 200 Maximum records to return. Set to 0 for unlimited — requires at least one filter when doing so.
proxyConfiguration object disabled Proxy settings. Not required — ClinicalTrials.gov does not have anti-bot measures.

ClinicalTrials.gov Crawler Output Fields

{
    "nct_id": "NCT02625935",
    "study_title": "A Prospective Observational Study Evaluating Treatment Decision Impact of Prosigna",
    "brief_summary": "This study evaluates whether the Prosigna assay changes treatment decisions for early-stage breast cancer patients...",
    "study_status": "COMPLETED",
    "phase": "PHASE3",
    "study_type": "OBSERVATIONAL",
    "conditions": ["Breast Cancer"],
    "interventions": ["Prosigna Breast Cancer Prognostic Gene Signature Assay"],
    "intervention_types": ["DIAGNOSTIC_TEST"],
    "lead_sponsor": "NanoString Technologies, Inc.",
    "lead_sponsor_type": "INDUSTRY",
    "collaborators": ["American Society of Clinical Oncology"],
    "enrollment_count": 201,
    "enrollment_type": "ACTUAL",
    "start_date": "2015-10",
    "primary_completion_date": "2017-06",
    "completion_date": "2017-06",
    "primary_outcome": "Change in treatment recommendation (12 months)",
    "secondary_outcomes": [
        "Patient anxiety levels (6 months)",
        "Physician confidence in treatment decision (12 months)"
    ],
    "eligibility_criteria": "Inclusion Criteria:\n- Female\n- Diagnosed with early-stage, hormone receptor-positive breast cancer...\n\nExclusion Criteria:\n- Prior chemotherapy...",
    "min_age": "18 Years",
    "max_age": "",
    "sex": "FEMALE",
    "locations": [
        {
            "facility": "Memorial Sloan Kettering Cancer Center",
            "city": "New York",
            "state": "New York",
            "country": "United States",
            "zip": "10065",
            "contact_name": "Dr. Jane Smith",
            "contact_phone": "212-555-0100",
            "contact_email": "smith@mskcc.org"
        }
    ],
    "has_results": true,
    "results_first_posted": "2018-03-15",
    "last_update_posted": "2023-01-10",
    "study_url": "https://clinicaltrials.gov/study/NCT02625935"
}
Field Type Description
nct_id string ClinicalTrials.gov identifier (e.g. NCT00000001)
study_title string Official study title
brief_summary string Brief summary of the study purpose and design
study_status string Overall study status: RECRUITING, COMPLETED, ACTIVE_NOT_RECRUITING, etc.
phase string Trial phase: PHASE1, PHASE2, PHASE3, PHASE4, EARLY_PHASE1, NA
study_type string Study type: INTERVENTIONAL, OBSERVATIONAL, EXPANDED_ACCESS
conditions string[] Conditions or diseases being studied
interventions string[] Intervention names: drugs, devices, procedures
intervention_types string[] Intervention types: DRUG, DEVICE, BIOLOGICAL, PROCEDURE, DIAGNOSTIC_TEST
lead_sponsor string Lead sponsor organization name
lead_sponsor_type string Sponsor class: INDUSTRY, NIH, OTHER, NETWORK
collaborators string[] Collaborating organizations
enrollment_count number Participant count — enrolled or estimated
enrollment_type string Whether the enrollment count is ACTUAL or ESTIMATED
start_date string Study start date
primary_completion_date string Date of last participant's last visit for the primary outcome
completion_date string Full study completion date
primary_outcome string Primary outcome measure with time frame
secondary_outcomes string[] Secondary outcome measures with time frames
eligibility_criteria string Full inclusion and exclusion criteria text
min_age string Minimum eligible age
max_age string Maximum eligible age (empty string if no upper limit)
sex string Eligible sex: ALL, MALE, FEMALE
locations object[] Study sites — facility, city, state, country, zip, and contact details
has_results boolean Whether results have been posted to ClinicalTrials.gov
results_first_posted string Date results were first posted
last_update_posted string Date of the most recent record update
study_url string Direct URL to the study page on ClinicalTrials.gov

FAQ

How many clinical trials does ClinicalTrials.gov Crawler cover? ClinicalTrials.gov Crawler queries the full ClinicalTrials.gov database — over 500,000 registered studies from all countries. If a study was registered there, the crawler can reach it.

Do I need proxies or an API key to run this? No. ClinicalTrials.gov is a public U.S. government service maintained by the National Library of Medicine. The API requires no authentication and no proxy. The crawler ships with proxies disabled by default.

Can I run a bulk export without filters? Not with maxItems set to 0. An unlimited run with no filters would queue the entire 500K+ record database, which is rarely what anyone actually needs. Provide at least one filter — condition, sponsor, phase, status, or study type — when running unlimited. With filters, unlimited runs are fine.

How current is the data? ClinicalTrials.gov Crawler reads from the live API. Sponsors are required to update their registrations regularly, and the last_update_posted field on each record shows when that specific study was last modified. The crawler does not cache anything.

What is the difference between studyStatus and studyType? ClinicalTrials.gov Crawler treats them as separate axes. Status describes where a study is in its lifecycle — RECRUITING, COMPLETED, SUSPENDED, etc. Type describes the study design — INTERVENTIONAL (a drug or device being tested), OBSERVATIONAL (no assigned intervention), or EXPANDED_ACCESS. Both filters can be applied at the same time.

Need More Features?

Need additional fields, a different data source, or a scheduled run? Get in touch.

Why Use ClinicalTrials.gov Crawler?

  • Official API, not HTML scraping — the crawler reads the ClinicalTrials.gov v2 JSON endpoints directly, so field names and data structure match what the NLM publishes, not what a selector happened to grab last Tuesday
  • 25+ fields per study, including contact data — each location record carries facility name, address, and primary contact information, which matters when the goal is outreach rather than just counting trials
  • No proxy cost, no authentication overhead — government data, open access; the crawler's per-record cost reflects actual compute, not unnecessary infrastructure