OrbTop

EMA Medicines Scraper - European Drug Authorisation Register

BUSINESSOTHERDEVELOPER TOOLS

EMA Medicines Scraper — European Drug Authorisation Register

Extract the complete European Medicines Agency (EMA) centralised medicines authorisation register. Covers all human and veterinary medicines that have received, or applied for, a centralised EU marketing authorisation.

Data source: EMA nightly XLSX bulk export at ema.europa.eu. Updated every night by the EMA.


What data does it extract?

Each record corresponds to one medicine and includes:

Field Description
medicine_name Brand name
category Human or Veterinary
ema_product_number EMA product number (e.g. EMEA/H/C/004781)
authorisation_status Authorised, Withdrawn, Refused, or Suspended
inn International Non-proprietary Name / common name
active_substance Active substance(s)
therapeutic_area Therapeutic area (MeSH terms)
atc_code ATC code (human) or ATCvet code (veterinary)
pharmacotherapeutic_group Pharmacotherapeutic group
marketing_authorisation_holder MAH company name
first_authorised_date First EU marketing authorisation date (DD/MM/YYYY)
orphan_designation Orphan medicine designation flag
biosimilar Biosimilar flag
generic_or_hybrid Generic or hybrid application flag
conditional_marketing_authorisation Conditional approval flag
additional_monitoring Additional monitoring (black triangle) flag
accelerated_assessment Accelerated assessment flag
exceptional_circumstances Exceptional circumstances flag
product_url EMA product page URL

Input options

Parameter Type Default Description
medicineCategory String human Filter: human, veterinary, or leave blank for all
authorisationStatus String Authorised Filter: Authorised, Withdrawn, Refused, Suspended, or blank for all
therapeuticArea String (blank) Filter by therapeutic area substring, case-insensitive (e.g. Diabetes)
atcCode String (blank) Filter by ATC code prefix (e.g. L01 for antineoplastics)
authorisationDateFrom String (blank) Include only medicines authorised on or after this date (YYYY-MM-DD or DD/MM/YYYY)
authorisationDateTo String (blank) Include only medicines authorised on or before this date (YYYY-MM-DD or DD/MM/YYYY)
maxItems Integer 15 Maximum number of records to return (0 = all)

How it works

The actor downloads EMA's nightly XLSX bulk export (approximately 885 KB, ~2,700 records) using a single HTTP request. No browser automation, no pagination, no proxy required. The XLSX is parsed in-memory using Node.js built-in modules, then filtered and saved to the Apify dataset.

Performance: Typically completes in under 10 seconds.

Memory: 256 MB is sufficient. The actor is configured for 512 MB to be safe.


Example run

Input:

{
  "medicineCategory": "human",
  "authorisationStatus": "Authorised",
  "atcCode": "L01",
  "maxItems": 5
}

Sample output record:

{
  "medicine_name": "Keytruda",
  "category": "Human",
  "ema_product_number": "EMEA/H/C/003820",
  "authorisation_status": "Authorised",
  "inn": "pembrolizumab",
  "active_substance": "pembrolizumab",
  "therapeutic_area": "Melanoma; Carcinoma, Non-Small-Cell Lung; ...",
  "atc_code": "L01FF02",
  "pharmacotherapeutic_group": "Antineoplastic agents, monoclonal antibodies",
  "marketing_authorisation_holder": "Merck Sharp & Dohme B.V.",
  "first_authorised_date": "17/07/2015",
  "orphan_designation": false,
  "biosimilar": false,
  "generic_or_hybrid": false,
  "conditional_marketing_authorisation": false,
  "additional_monitoring": true,
  "accelerated_assessment": false,
  "exceptional_circumstances": false,
  "product_url": "https://www.ema.europa.eu/en/medicines/human/EPAR/keytruda"
}

Use cases

  • Pharma intelligence: Monitor which medicines have EU authorisation and track MAH portfolios
  • Biotech business development: Identify orphan, biosimilar, or conditionally approved medicines
  • Regulatory consulting: Track EU status of medicines by active substance or therapeutic area
  • Academic research: Build datasets of authorised medicines by ATC code or indication
  • Generics manufacturers: Identify authorised generic/hybrid medicines

Notes

  • Data is updated nightly by the EMA. Each actor run downloads the latest version.
  • The dataset covers approximately 2,700 medicines in the centralised authorisation procedure. Nationally authorised medicines are not included.
  • Withdrawn medicines remain in the dataset with status Withdrawn.