OrbTop

MSHA Mine Data Retrieval Scraper - US Mines Production + Safety

BUSINESSDEVELOPER TOOLSOTHER

MSHA Mine Data Retrieval Scraper

Extracts US mine records from the MSHA Open Government Data portal. Returns mine identity, operator and controller info, geocoordinates, commodity classification, status, employee counts, and — optionally — quarterly production tons, ownership history, violations, and accident counts for all 91,000+ mines in the MSHA registry.

MSHA Mine Scraper Features

  • Extracts 26+ fields per mine record — mine ID, name, operator, controller (ultimate parent), state, county, FIPS code, lat/lon, mine type, mine class, commodity, SIC code, status, employees, operating days, nearest town, MSHA district and office
  • Filters by mine class — coal, metal/nonmetal, or all
  • Filters by commodity — substring match against SIC description (lithium, copper, iron, bituminous, anthracite, stone, sand, gravel, and any other MSHA commodity label)
  • Filters by status — Active, Temporarily Idled, NonProducing, Abandoned, New Mine, Intermittent, or all
  • Joins quarterly production history — all historical production tons, hours worked, and average employee counts by quarter and subunit, back to the first recorded year
  • Joins operator/controller history — the full M&A ownership chain with start/end dates for each controller and operator
  • Joins violations — citation count for the past 12 months plus a structured history of recent violations with section-of-act and violation type
  • Joins accident records — injury and accident count for the past 12 months
  • No proxy required — MSHA open data is public and does not block bulk downloads

What Can You Do With MSHA Mine Data?

  • Utility and coal analysts — pull quarterly production tons by mine, subunit, and commodity to track output trends without manual MSHA portal navigation
  • Critical-minerals researchers — filter active metal/nonmetal mines by commodity (lithium, copper, cobalt, rare earths) and get operator, location, and production context in one run
  • Mining M&A advisory — join the controller history dataset to reconstruct the full ownership chain for a target mine or portfolio
  • Environmental NGOs — identify active surface mines by state and county, then enrich with violation counts to prioritize investigation targets
  • Compliance teams — pull violation and accident histories to benchmark a mine's safety record against peers in the same district
  • Data journalists — map every active mine in a given state with geocoordinates and production figures, without downloading and parsing multiple MSHA ZIP files by hand

How It Works

  1. Downloads the MSHA Mines registry — the master registry ZIP (7.3 MB compressed, ~91,000 rows) is pulled from the MSHA open data endpoint and parsed in memory
  2. Applies your filters — coal/metal class, status, and commodity filters run against the registry before any joins, so enrichment datasets only load for mines that match
  3. Joins optional datasets in parallel — if you request quarterly production, controller history, violations, or accidents, those ZIPs are downloaded concurrently and indexed by MINE_ID
  4. Returns structured records — each mine record is written to the dataset with flat fields for registry data and JSON strings for the array joins (production, history, violations)

The production quarterly dataset is 56 MB compressed and covers 35,000 unique mines. Controller history runs ~119 MB. Factor that into your run time when enabling those options.

MSHA Mine Scraper Input

{
  "coalOrMetal": "M",
  "commodityFilter": "copper",
  "mineStatus": "Active",
  "includeProductionHistory": true,
  "includeControllerHistory": false,
  "includeViolations": false,
  "includeAccidents": false,
  "maxItems": 100,
  "sp_intended_usage": "critical minerals research",
  "sp_improvement_suggestions": "none"
}
Field Type Default Description
coalOrMetal string "all" Mine class filter: "all", "C" (coal only), or "M" (metal/nonmetal only)
commodityFilter string "" Substring match against SIC description (case-insensitive). Leave blank for all commodities.
mineStatus string "Active" Status filter: Active, Temporarily Idled, NonProducing, Abandoned, Abandoned and Sealed, New Mine, Intermittent, or all
includeProductionHistory boolean true Join quarterly production CSV to add production_quarterly
includeControllerHistory boolean false Join controller/operator history CSV to add operator_history
includeViolations boolean false Join violations CSV to add violations_count_12mo and violations_history
includeAccidents boolean false Join accidents CSV to add accidents_12mo
maxItems integer 10 Maximum records to return. Set to 0 for unlimited.
sp_intended_usage string Required. Describe your intended use of this data.
sp_improvement_suggestions string Required. Share any suggestions for improving the actor.

MSHA Mine Scraper Output Fields

{
  "mine_id": "4200017",
  "mine_name": "EMERALD MINE NO 1",
  "coal_metal_ind": "C",
  "mine_type": "Underground",
  "mine_status": "Active",
  "status_date": "1978-12-15",
  "controller_id": "0000055",
  "controller_name": "CONSOL ENERGY INC",
  "controller_start_date": "2020-01-01",
  "operator_id": "0218869",
  "operator_name": "CONSOL PENNSYLVANIA COAL COMPANY LLC",
  "state": "PA",
  "county": "GREENE",
  "fips_county_code": "059",
  "latitude": 39.8219,
  "longitude": -80.1781,
  "primary_sic_code": "1220",
  "primary_commodity": "Bituminous Coal",
  "primary_canvass": "Coal(Bituminous)",
  "secondary_commodity": "",
  "num_employees": 350,
  "days_per_week": 5,
  "nearest_town": "Wind Ridge",
  "district": "3",
  "office_name": "Waynesburg District",
  "portable_operation": "N",
  "production_quarterly": "[{\"cal_yr\":2024,\"cal_qtr\":3,\"subunit\":\"UNDERGROUND\",\"avg_employees\":341,\"hours_worked\":148560,\"coal_production\":1247000},{\"cal_yr\":2024,\"cal_qtr\":2,...}]",
  "operator_history": null,
  "violations_count_12mo": null,
  "violations_history": null,
  "accidents_12mo": null,
  "source_url": "https://arlweb.msha.gov/OpenGovernmentData/OGIMSHA.asp"
}
Field Type Description
mine_id string MSHA Mine ID (7-digit, primary key)
mine_name string Current mine name from Legal ID Form
coal_metal_ind string Mine class: C=Coal, M=Metal/NonMetal
mine_type string Mine type: Surface, Underground, Facility, or Other
mine_status string Current status: Active, Temporarily Idled, NonProducing, Abandoned, etc.
status_date string Date mine entered current status (YYYY-MM-DD)
controller_id string MSHA controller ID for the ultimate parent entity
controller_name string Name of the controller (ultimate parent of operator)
controller_start_date string Date current controller took control (YYYY-MM-DD)
operator_id string MSHA operator ID
operator_name string Current operator name
state string 2-letter state abbreviation
county string County name (FIPS county name)
fips_county_code string 3-digit FIPS county code
latitude number Mine latitude (decimal degrees)
longitude number Mine longitude (decimal degrees)
primary_sic_code string Primary SIC code
primary_commodity string Primary commodity description (SIC description)
primary_canvass string Primary industry group (e.g., Coal(Bituminous), M/NM (Stone), Metal)
secondary_commodity string Secondary commodity description
num_employees number Number of workers at mine
days_per_week number Operating days per week
nearest_town string Nearest town or city
district string MSHA district code
office_name string MSHA office responsible for inspections
portable_operation string Y/N portable mine indicator
production_quarterly string JSON array of quarterly production records — cal_yr, cal_qtr, subunit, avg_employees, hours_worked, coal_production (requires includeProductionHistory: true)
operator_history string JSON array of controller/operator history records — controller_name, operator_name, operator_start_dt, operator_end_dt, controller_start_dt, controller_end_dt, mine_status (requires includeControllerHistory: true)
violations_count_12mo number Violations issued in the past 12 months (requires includeViolations: true)
violations_history string JSON array of recent violations — violation_no, inspection_begin_dt, violation_issue_dt, cal_yr, violator_name, section_of_act, violation_type (requires includeViolations: true)
accidents_12mo number Accident/injury records in the past 12 months (requires includeAccidents: true)
source_url string URL of the source MSHA open data page

🔍 FAQ

How do I extract MSHA mine data?

MSHA Mine Data Retrieval Scraper pulls directly from the MSHA Open Government Data bulk CSV exports. Configure your filters in the input, run the actor, and download the dataset — no MSHA portal account or manual CSV downloads required.

What does MSHA Mine Data Retrieval Scraper cost to run?

The actor charges $0.10 per run plus $0.001 per record. Pulling 1,000 active coal mines with quarterly production history runs roughly $1.10 total. Enabling the controller history dataset (119 MB download) adds compute time but not additional per-record cost.

Can I filter by specific commodities like lithium or copper?

Yes. Set commodityFilter to any commodity keyword and the actor does a case-insensitive substring match against the MSHA SIC description. "copper" returns copper mines, "lithium" returns lithium mines, "stone" returns crushed stone operations. Leave it blank to get all commodities.

Does MSHA Mine Data Retrieval Scraper need proxies?

No. MSHA open data is publicly available without authentication or rate limits. The actor downloads ZIP files directly from the MSHA server — no proxy configuration needed.

How current is the mine data?

MSHA updates the open data CSVs regularly. The actor always pulls the latest published version at run time. The status_date field tells you when each mine's current status was last changed by MSHA.


Need More Features?

Need custom filters, additional MSHA datasets, or scheduled runs? File an issue or get in touch.

Why Use MSHA Mine Data Retrieval Scraper?

  • Covers the full registry — all 91,000+ mines across coal, metal, and nonmetal classes, with optional joins for production history, ownership chain, violations, and accidents in a single run
  • No proxies, no auth, no scraping fragility — pulls from official government bulk exports, so it doesn't break when MSHA updates their web UI
  • Clean structured output — flat JSON records with consistent field names, ready for a spreadsheet, database, or downstream pipeline without reformatting