EMA Medicines Scraper - European Drug Authorisation Register
EMA Medicines Scraper — European Drug Authorisation Register
Extract the complete European Medicines Agency (EMA) centralised medicines authorisation register. Covers all human and veterinary medicines that have received, or applied for, a centralised EU marketing authorisation.
Data source: EMA nightly XLSX bulk export at ema.europa.eu. Updated every night by the EMA.
What data does it extract?
Each record corresponds to one medicine and includes:
| Field | Description |
|---|---|
medicine_name |
Brand name |
category |
Human or Veterinary |
ema_product_number |
EMA product number (e.g. EMEA/H/C/004781) |
authorisation_status |
Authorised, Withdrawn, Refused, or Suspended |
inn |
International Non-proprietary Name / common name |
active_substance |
Active substance(s) |
therapeutic_area |
Therapeutic area (MeSH terms) |
atc_code |
ATC code (human) or ATCvet code (veterinary) |
pharmacotherapeutic_group |
Pharmacotherapeutic group |
marketing_authorisation_holder |
MAH company name |
first_authorised_date |
First EU marketing authorisation date (DD/MM/YYYY) |
orphan_designation |
Orphan medicine designation flag |
biosimilar |
Biosimilar flag |
generic_or_hybrid |
Generic or hybrid application flag |
conditional_marketing_authorisation |
Conditional approval flag |
additional_monitoring |
Additional monitoring (black triangle) flag |
accelerated_assessment |
Accelerated assessment flag |
exceptional_circumstances |
Exceptional circumstances flag |
product_url |
EMA product page URL |
Input options
| Parameter | Type | Default | Description |
|---|---|---|---|
medicineCategory |
String | human |
Filter: human, veterinary, or leave blank for all |
authorisationStatus |
String | Authorised |
Filter: Authorised, Withdrawn, Refused, Suspended, or blank for all |
therapeuticArea |
String | (blank) | Filter by therapeutic area substring, case-insensitive (e.g. Diabetes) |
atcCode |
String | (blank) | Filter by ATC code prefix (e.g. L01 for antineoplastics) |
authorisationDateFrom |
String | (blank) | Include only medicines authorised on or after this date (YYYY-MM-DD or DD/MM/YYYY) |
authorisationDateTo |
String | (blank) | Include only medicines authorised on or before this date (YYYY-MM-DD or DD/MM/YYYY) |
maxItems |
Integer | 15 |
Maximum number of records to return (0 = all) |
How it works
The actor downloads EMA's nightly XLSX bulk export (approximately 885 KB, ~2,700 records) using a single HTTP request. No browser automation, no pagination, no proxy required. The XLSX is parsed in-memory using Node.js built-in modules, then filtered and saved to the Apify dataset.
Performance: Typically completes in under 10 seconds.
Memory: 256 MB is sufficient. The actor is configured for 512 MB to be safe.
Example run
Input:
{
"medicineCategory": "human",
"authorisationStatus": "Authorised",
"atcCode": "L01",
"maxItems": 5
}
Sample output record:
{
"medicine_name": "Keytruda",
"category": "Human",
"ema_product_number": "EMEA/H/C/003820",
"authorisation_status": "Authorised",
"inn": "pembrolizumab",
"active_substance": "pembrolizumab",
"therapeutic_area": "Melanoma; Carcinoma, Non-Small-Cell Lung; ...",
"atc_code": "L01FF02",
"pharmacotherapeutic_group": "Antineoplastic agents, monoclonal antibodies",
"marketing_authorisation_holder": "Merck Sharp & Dohme B.V.",
"first_authorised_date": "17/07/2015",
"orphan_designation": false,
"biosimilar": false,
"generic_or_hybrid": false,
"conditional_marketing_authorisation": false,
"additional_monitoring": true,
"accelerated_assessment": false,
"exceptional_circumstances": false,
"product_url": "https://www.ema.europa.eu/en/medicines/human/EPAR/keytruda"
}
Use cases
- Pharma intelligence: Monitor which medicines have EU authorisation and track MAH portfolios
- Biotech business development: Identify orphan, biosimilar, or conditionally approved medicines
- Regulatory consulting: Track EU status of medicines by active substance or therapeutic area
- Academic research: Build datasets of authorised medicines by ATC code or indication
- Generics manufacturers: Identify authorised generic/hybrid medicines
Notes
- Data is updated nightly by the EMA. Each actor run downloads the latest version.
- The dataset covers approximately 2,700 medicines in the centralised authorisation procedure. Nationally authorised medicines are not included.
- Withdrawn medicines remain in the dataset with status
Withdrawn.