OrbTop

USDA Soil Data Access (SDA) Survey Scraper

BUSINESSOTHER

USDA Soil Data Access (SDA) Survey Scraper

Extract typed, denormalised soil data from the USDA NRCS SSURGO database via the Soil Data Access REST API. Point it at one or more county area symbols and get back structured map unit, component, and soil horizon records — pH, texture percentages, organic matter, available water capacity, cation exchange capacity, drainage class, and taxonomy — without writing a line of T-SQL.

What it does

The Soil Data Access (SDA) endpoint is the official REST interface to SSURGO, the most complete US soil survey database. It accepts raw SQL queries over POST and returns untyped arrays — useful for experts, hostile for everyone else.

This actor wraps that endpoint into a clean workflow:

  1. Accepts a list of county area symbols (e.g. TX001 for Anderson County TX), a two-letter state code, or any combination.
  2. Joins the legend → mapunit → component → chorizon tables automatically.
  3. Parses the column-name header row into a labelled, typed record.
  4. Outputs one row per soil horizon per component (or one row per component when horizon data is disabled).

All numeric fields (pH, slope, clay, sand, silt, OM, AWC, CEC, elevation, precipitation, temperature) are returned as JavaScript numbers, not strings.

Input

Field Type Description
areasymbols array of strings County survey area symbols to query (e.g. ["TX001", "CA067"]). Required unless stateCode is set.
stateCode string Optional 2-letter state code (e.g. TX). Expands to all survey areas in that state. Ignored when areasymbols is set.
includeHorizons boolean When true (default), joins chorizon data; one row per horizon per component. Set false for component-level rows only.
maxItems integer Maximum records to return. 0 = unlimited. Default 10.

Finding area symbols

Area symbols follow the pattern {STATE}{NUMBER} where the number is a zero-padded county code. A full legend is available at https://sdmdataaccess.sc.egov.usda.gov. Common examples:

Symbol Area
TX001 Anderson County, Texas
CA067 Marin County, California
IA001 Adair County, Iowa
OR001 Baker County, Oregon

Output

Each record contains:

Location / map unit

  • areasymbol — county survey code (e.g. TX001)
  • areaname — county name (e.g. Anderson County, Texas)
  • mukey — map unit key
  • musym — map unit symbol
  • muname — map unit name

Soil component

  • cokey — component key
  • compname — soil series name (e.g. Gallime)
  • comppct_r — representative component percent
  • taxorder / taxsuborder / taxgrtgroup — soil taxonomy
  • drainagecl — drainage class
  • hydricrating — hydric soil classification
  • slope_r, elev_r — slope (%) and elevation (m)
  • airtempa_r, map_r — mean annual temp (°C) and precipitation (mm)
  • frostact — frost action class

Soil horizon (when includeHorizons: true)

  • chkey — horizon key
  • hzname — horizon name (A, B, Bt, C, etc.)
  • hzdept_r, hzdepb_r — top and bottom depth (cm)
  • sandtotal_r, silttotal_r, claytotal_r — texture fractions (weight %)
  • om_r — organic matter (weight %)
  • ph1to1h2o_r — soil pH (1:1 water)
  • awc_r — available water capacity (cm/cm)
  • cec7_r — cation exchange capacity at pH 7 (meq/100g)
  • source — always USDA NRCS SSURGO via Soil Data Access

Example use cases

  • Agronomy / precision agriculture — query soil texture and pH for specific fields before planting decisions.
  • Land development / site feasibility — check drainage class, slope, and AWC for construction suitability.
  • Environmental / wetland consulting — identify hydric soils by hydricrating across a project area.
  • Landscaping / native-plant siting — match soil taxonomy and pH to species requirements.
  • AI training data — build structured soil property datasets for machine learning models.

Notes

  • Data is sourced from USDA NRCS SSURGO — the official national soil survey. It is public-domain federal data.
  • The SDA endpoint is single-threaded. This actor queries it conservatively (single-threaded with 500ms courtesy delay) to avoid overloading the federal server.
  • Large area queries (entire states, 100+ counties) produce millions of rows. Use maxItems to cap output during testing and set it to 0 only when you need the full dataset.
  • Not all counties have complete horizon data. Fields may be null where the survey is incomplete.