OrbTop

IMSLP Public Domain Sheet Music Scraper

EDUCATIONFOR CREATORS

IMSLP Public Domain Sheet Music Scraper

Walk the full IMSLP catalog and pull structured data on 230,000+ musical works, 24,000+ composers, and their associated score files — all public-domain by construction.

IMSLP has two public APIs. This scraper uses both. The worklist API delivers the complete work index at 1,000 records per page. The MediaWiki API fills in per-work details: key, genre, instrumentation, composition year, and the file manifest with direct PDF download links. You can run fast (worklist-only, no detail calls) or complete (full enrichment). Both modes respect the site's request etiquette.

What You Get

Each record covers one musical work.

Field Type Description
work_id string IMSLP/MediaWiki page ID
work_title string Work title as listed on IMSLP
composer string Composer full name
composer_slug string IMSLP category identifier
opus_catalogue string Op., BWV, K., or other catalogue number
genre string Piece style and genre (e.g. "Baroque — fugues")
instrumentation string Scored for (e.g. "piano", "2 violins, viola, cello")
key string Musical key
composition_year string Year or date of composition
first_publication string Year of first publication
score_files string JSON array of score PDFs with filename, description, file URL, copyright, editor
parts_files string JSON array of parts PDFs (same structure)
arrangements string JSON array of arrangement PDFs (same structure)
copyright_status string Copyright tag from IMSLP (almost always "Public Domain")
license string Specific license
imslp_url string Canonical IMSLP work page URL
scraped_at string ISO 8601 timestamp

File arrays are JSON-encoded strings. Each entry has: filename, description, editor, copyright, file_url.

Input

Parameter Type Default Description
maxItems integer 10 Maximum works to return
includeFileDetails boolean true Fetch MediaWiki API for file lists, key, genre, instrumentation. Disable for faster bulk exports — you get the catalog skeleton without per-work details.
composerFilter string Optional composer name filter (e.g. "Bach, Johann Sebastian"). Leave blank for the full catalog.

File Detail Mode

When includeFileDetails is enabled, the scraper makes one additional MediaWiki API call per work to parse the work's wikitext. This populates score_files, parts_files, arrangements, instrumentation, key, genre, composition_year, and first_publication. It also adds ~200ms per record to the run time. For full-catalog exports where you only need the work index, disable it.

Coverage

IMSLP's public-domain mandate is not a coincidence. The library was built specifically to host scores where the copyright has expired or been dedicated to the public domain. The copyright_status field reflects IMSLP's own tagging — but the corpus is the corpus because legal reviews are baked in at submission time.

Score file URLs point to imslp.org/wiki/Special:ReverseLookup/<filename>, which resolves to the PDF download. These are the same URLs end users click in the IMSLP UI.

Use Cases

  • Build a searchable public-domain score database
  • Feed OMR (optical music recognition) or generative music training pipelines
  • Music education platforms that need structured work metadata
  • Digital library catalogs with direct PDF access
  • Composer or instrumentation research at scale

Data Volume

The full catalog is approximately 230,000 works. Without a composer filter and with includeFileDetails enabled, a complete run takes several hours due to polite pacing between MediaWiki API calls. Use composerFilter to scope to a specific composer, or set includeFileDetails: false for a fast full-catalog index run.


Built by OrbTop. Data sourced from IMSLP via its public APIs.