OrbTop

DLMF NIST Math Functions Scraper

AIDEVELOPER TOOLS

DLMF NIST Math Functions Scraper

Scrapes the NIST Digital Library of Mathematical Functions (DLMF) — the authoritative reference for special functions in mathematics and physics — to produce a structured, machine-readable corpus of numbered equations with MathML, LaTeX source, and associated metadata.

What it does

The actor performs a three-level hierarchical crawl:

  1. Index — discovers all 36 DLMF chapters from the homepage
  2. Chapter pages — discovers all sections within each chapter (typically 10–20 sections)
  3. Section pages — extracts every numbered equation including MathML, LaTeX TeX source, plain-text rendering, referenced symbols, and the canonical permalink

Across all 36 chapters the DLMF contains approximately 5,000–10,000 numbered equations. A full crawl completes in minutes at the default concurrency.

Output fields

Field Description
chapter Chapter number (integer, 1–36)
section Section identifier, e.g. 1.2
title Section title, e.g. Elementary Algebra
equation_number DLMF equation number, e.g. 1.2.1
equation_mathml Full MathML XML for the equation
equation_tex LaTeX source recovered from MathML alttext attribute
equation_text Unicode plain-text rendering of the equation
constraints Constraint text associated with the equation (if any)
referenced_functions Pipe-separated list of symbol/function names referenced
url Canonical DLMF permalink, e.g. http://dlmf.nist.gov/1.2.E1

Use cases

  • Symbolic math / CAS training data — verified special-function formulas with LaTeX and MathML
  • RAG / vector search corpora — ground-truth equation database for scientific-computing AI agents
  • Formula search engines — structured index of equations by chapter/section with canonical IDs
  • Verification datasets — NIST-authoritative identities for function evaluations

Input parameters

Parameter Type Default Description
maxItems integer 10 Maximum number of equations to scrape (0 = unlimited)
startChapter integer 1 First chapter to crawl (1–36)
endChapter integer 36 Last chapter to crawl (1–36, omit for all)

Example output record

{
  "chapter": 1,
  "section": "1.2",
  "title": "Elementary Algebra",
  "equation_number": "1.2.1",
  "equation_tex": "\\genfrac{(}{)}{0.0pt}{}{n}{k}=\\frac{n!}{(n-k)!k!}",
  "equation_text": "(nk)=n!/(n−k)!⁢k!",
  "constraints": "",
  "referenced_functions": "(mn): binomial coefficient | !: factorial (as in n!) | n:  nonnegative integer",
  "url": "http://dlmf.nist.gov/1.2.E1"
}

Notes

  • The DLMF is a US government publication (NIST). Content is in the public domain.
  • No proxy required — dlmf.nist.gov is a clean US gov host with no anti-bot measures.
  • Chapter 1 alone contains ~180 equations across 18 sections. Full 36-chapter run yields ~5,000+ records.
  • Sections containing only notation tables (no numbered equations) return 0 results — this is expected.