DLMF NIST Math Functions Scraper
AIDEVELOPER TOOLS
DLMF NIST Math Functions Scraper
Scrapes the NIST Digital Library of Mathematical Functions (DLMF) — the authoritative reference for special functions in mathematics and physics — to produce a structured, machine-readable corpus of numbered equations with MathML, LaTeX source, and associated metadata.
What it does
The actor performs a three-level hierarchical crawl:
- Index — discovers all 36 DLMF chapters from the homepage
- Chapter pages — discovers all sections within each chapter (typically 10–20 sections)
- Section pages — extracts every numbered equation including MathML, LaTeX TeX source, plain-text rendering, referenced symbols, and the canonical permalink
Across all 36 chapters the DLMF contains approximately 5,000–10,000 numbered equations. A full crawl completes in minutes at the default concurrency.
Output fields
| Field | Description |
|---|---|
chapter |
Chapter number (integer, 1–36) |
section |
Section identifier, e.g. 1.2 |
title |
Section title, e.g. Elementary Algebra |
equation_number |
DLMF equation number, e.g. 1.2.1 |
equation_mathml |
Full MathML XML for the equation |
equation_tex |
LaTeX source recovered from MathML alttext attribute |
equation_text |
Unicode plain-text rendering of the equation |
constraints |
Constraint text associated with the equation (if any) |
referenced_functions |
Pipe-separated list of symbol/function names referenced |
url |
Canonical DLMF permalink, e.g. http://dlmf.nist.gov/1.2.E1 |
Use cases
- Symbolic math / CAS training data — verified special-function formulas with LaTeX and MathML
- RAG / vector search corpora — ground-truth equation database for scientific-computing AI agents
- Formula search engines — structured index of equations by chapter/section with canonical IDs
- Verification datasets — NIST-authoritative identities for function evaluations
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
maxItems |
integer | 10 | Maximum number of equations to scrape (0 = unlimited) |
startChapter |
integer | 1 | First chapter to crawl (1–36) |
endChapter |
integer | 36 | Last chapter to crawl (1–36, omit for all) |
Example output record
{
"chapter": 1,
"section": "1.2",
"title": "Elementary Algebra",
"equation_number": "1.2.1",
"equation_tex": "\\genfrac{(}{)}{0.0pt}{}{n}{k}=\\frac{n!}{(n-k)!k!}",
"equation_text": "(nk)=n!/(n−k)!k!",
"constraints": "",
"referenced_functions": "(mn): binomial coefficient | !: factorial (as in n!) | n: nonnegative integer",
"url": "http://dlmf.nist.gov/1.2.E1"
}
Notes
- The DLMF is a US government publication (NIST). Content is in the public domain.
- No proxy required — dlmf.nist.gov is a clean US gov host with no anti-bot measures.
- Chapter 1 alone contains ~180 equations across 18 sections. Full 36-chapter run yields ~5,000+ records.
- Sections containing only notation tables (no numbered equations) return 0 results — this is expected.