OrbTop

BGCI GlobalTreeSearch Tree Species Scraper

AIDEVELOPER TOOLS

BGCI GlobalTreeSearch Tree Species Scraper

Scrapes the BGCI GlobalTreeSearch database — the authoritative, comprehensive list of the world's ~58,000 tree species maintained by the Botanic Gardens Conservation International (BGCI). Each record includes the full scientific name, plant family, taxonomic author, data source, native-country distribution, and known synonyms.

GlobalTreeSearch is the primary reference behind the Global Tree Assessment and is used by arboreta, reforestation programmes, carbon and forestry projects, and conservation NGOs worldwide. This actor extracts the complete database into a flat, queryable dataset.

Features

  • Scrapes all ~58,000 tree species from the BGCI GlobalTreeSearch JSON API
  • Supports targeted scraping by genus — specify one or more genera to limit the dataset
  • Extracts native-country distribution for every species (TSGeolinks)
  • Extracts taxonomic synonyms (TSTaxas)
  • No authentication required — uses the BGCI public data API
  • Polite rate limiting (200ms delay between genus requests)

Use cases

  • Carbon and reforestation projects — verify species eligibility and native range for project countries
  • Conservation NGOs — track the full tree species inventory with distribution metadata
  • ESG and biodiversity risk analytics — build species datasets for portfolio screening
  • Arboreta and botanical gardens — collection planning and completeness audits
  • AI training data — authoritative tree taxonomy + geography for forestry models

Input

Field Type Description Default
genera Array of strings List of genus names to scrape (e.g. ["Quercus", "Abies"]). Leave empty to scrape all ~4,162 genera. ["Quercus", "Abies"]
maxItems Integer Maximum number of records to collect 10

Example input (targeted):

{
  "genera": ["Quercus", "Abies", "Pinus"],
  "maxItems": 1000
}

Example input (full database):

{
  "genera": [],
  "maxItems": 100000
}

Output

Each record in the dataset represents one tree species:

Field Type Description
bgci_id Integer GlobalTreeSearch internal record ID
taxon String Full scientific name (genus + species + infraspecific epithets)
genus String Genus name (derived from the taxon name)
family String Plant family (e.g. Fagaceae, Pinaceae)
author String Taxonomic author citation
source String Data source (e.g. IUCN RL tree review 2020, WCSP)
problems String or null Data-quality or nomenclatural notes
distribution_done String Whether distribution mapping is complete (yes / no)
note String or null Additional remarks
native_countries String Semicolon-separated list of countries where the species is native
synonyms String Semicolon-separated list of known taxonomic synonyms
bgci_url String Link to the species record on the BGCI GlobalTreeSearch website

Example record:

{
  "bgci_id": 151050178,
  "taxon": "Quercus acatenangensis",
  "genus": "Quercus",
  "family": "Fagaceae",
  "author": "Trel.",
  "source": "IUCN RL tree review 2020",
  "problems": null,
  "distribution_done": "yes",
  "note": null,
  "native_countries": "El Salvador; Guatemala; Mexico",
  "synonyms": null,
  "bgci_url": "https://www.bgci.org/resources/bgci-databases/globaltreesearch/?species=Quercus%20acatenangensis"
}

Performance and pricing

  • Full database (all ~58k species): requires iterating ~4,162 genus endpoints. Each genus request takes ~2–3 seconds including the polite delay, so a full run takes several hours. Set a high maxItems and plan for a longer run.
  • Targeted genus scraping: very fast — each genus returns instantly (e.g. Quercus returns ~424 species in one request).
  • Pricing: Pay-per-event (PPE) — you are charged only for records collected, not for run time.

Data source

Data is sourced from the BGCI GlobalTreeSearch API — the public JSON endpoint of the Botanic Gardens Conservation International's GlobalTreeSearch database. No API key or authentication is required.

GlobalTreeSearch is distinct from:

  • GBIF — occurrence/observation points (where specimens have been found)
  • IPNI — nomenclature (plant name registry)
  • GlobalTreeSearch — tree-conservation status + native country distribution (this actor)

Limitations

  • Distribution data (native_countries) reflects the BGCI's geo-linking records and may not be exhaustive for all species.
  • Conservation status (e.g. IUCN Red List category) is referenced in the source field but the specific category is not returned by the genus endpoint. This is a GlobalTreeSearch API limitation.
  • The full ~58k species database requires a long runtime. For large-scale use, run with generous maxItems and timeoutSecs.