IMSLP Public Domain Sheet Music Scraper

Walk the full IMSLP catalog and pull structured data on 230,000+ musical works, 24,000+ composers, and their associated score files — all public-domain by construction.

IMSLP has two public APIs. This scraper uses both. The worklist API delivers the complete work index at 1,000 records per page. The MediaWiki API fills in per-work details: key, genre, instrumentation, composition year, and the file manifest with direct PDF download links. You can run fast (worklist-only, no detail calls) or complete (full enrichment). Both modes respect the site's request etiquette.

What You Get

Each record covers one musical work.

Field	Type	Description
`work_id`	string	IMSLP/MediaWiki page ID
`work_title`	string	Work title as listed on IMSLP
`composer`	string	Composer full name
`composer_slug`	string	IMSLP category identifier
`opus_catalogue`	string	Op., BWV, K., or other catalogue number
`genre`	string	Piece style and genre (e.g. "Baroque — fugues")
`instrumentation`	string	Scored for (e.g. "piano", "2 violins, viola, cello")
`key`	string	Musical key
`composition_year`	string	Year or date of composition
`first_publication`	string	Year of first publication
`score_files`	string	JSON array of score PDFs with filename, description, file URL, copyright, editor
`parts_files`	string	JSON array of parts PDFs (same structure)
`arrangements`	string	JSON array of arrangement PDFs (same structure)
`copyright_status`	string	Copyright tag from IMSLP (almost always "Public Domain")
`license`	string	Specific license
`imslp_url`	string	Canonical IMSLP work page URL
`scraped_at`	string	ISO 8601 timestamp

File arrays are JSON-encoded strings. Each entry has: filename, description, editor, copyright, file_url.

Input

Parameter	Type	Default	Description
`maxItems`	integer	10	Maximum works to return
`includeFileDetails`	boolean	true	Fetch MediaWiki API for file lists, key, genre, instrumentation. Disable for faster bulk exports — you get the catalog skeleton without per-work details.
`composerFilter`	string	—	Optional composer name filter (e.g. "Bach, Johann Sebastian"). Leave blank for the full catalog.

File Detail Mode

When includeFileDetails is enabled, the scraper makes one additional MediaWiki API call per work to parse the work's wikitext. This populates score_files, parts_files, arrangements, instrumentation, key, genre, composition_year, and first_publication. It also adds ~200ms per record to the run time. For full-catalog exports where you only need the work index, disable it.

Coverage

IMSLP's public-domain mandate is not a coincidence. The library was built specifically to host scores where the copyright has expired or been dedicated to the public domain. The copyright_status field reflects IMSLP's own tagging — but the corpus is the corpus because legal reviews are baked in at submission time.

Score file URLs point to imslp.org/wiki/Special:ReverseLookup/<filename>, which resolves to the PDF download. These are the same URLs end users click in the IMSLP UI.

Use Cases

Build a searchable public-domain score database
Feed OMR (optical music recognition) or generative music training pipelines
Music education platforms that need structured work metadata
Digital library catalogs with direct PDF access
Composer or instrumentation research at scale

Data Volume

The full catalog is approximately 230,000 works. Without a composer filter and with includeFileDetails enabled, a complete run takes several hours due to polite pacing between MediaWiki API calls. Use composerFilter to scope to a specific composer, or set includeFileDetails: false for a fast full-catalog index run.

Built by OrbTop. Data sourced from IMSLP via its public APIs.

IMSLP Public Domain Sheet Music Scraper

IMSLP Public Domain Sheet Music Scraper

What You Get

Input

File Detail Mode

Coverage

Use Cases

Data Volume

Related AI & Data scrapers