OrbTop

FishBase Species Database Scraper

EDUCATIONDEVELOPER TOOLS

FishBase Species Database Scraper

Scrape FishBase, the global reference database for fish species, and extract structured records for 35,000+ species covering taxonomy, ecology, size and weight, trophic level, IUCN Red List status, depth range, and common names. Returns clean JSON per species pulled from the public summary pages.


FishBase Scraper Features

  • Indexes 35,000+ fish species — every validated entry in the FishBase catalog
  • Extracts full taxonomy: scientific name, genus, species, family, order
  • Returns ecology fields: environment, climate zone, distribution, depth range
  • Captures size and life-history stats: max length, common length, max weight, max age
  • Pulls trophic level, IUCN Red List status, game-fish flag, and dangerousness classification
  • Includes primary English common name plus a source URL back to the FishBase summary page
  • No API key, no proxy, no captcha — just respectful crawling at the site's robots.txt rate
  • Pure HTTP scraping. Hits the ValidNameList index page once, then walks /summary/{ID} for each species

Who Uses FishBase Data?

  • Marine biologists and ichthyologists — Pull species-level data into research notebooks without scraping each summary page by hand
  • Fisheries management agencies — Cross-reference IUCN status and trophic level across stocks for assessment models
  • AI training datasets — Build species classification or natural-language Q&A datasets grounded in the canonical reference
  • Aquarium and reef hobbyists — Generate compatibility data driven by climate zone, depth range, and dangerousness classification
  • Game-fishing apps — Filter on is_game_fish and species distribution to power location-aware species guides
  • EdTech and museum apps — Populate species cards with verified taxonomy and conservation data

How the FishBase Scraper Works

  1. Set maxItems — Pick a sample size or leave it at 0 to crawl all 35,000+ species
  2. The scraper fetches the ValidNameList index — A single page that lists every species ID and scientific name on the site
  3. It walks one summary page per species at the polite rate FishBase requests in robots.txt
  4. Returns one normalized JSON record per species with taxonomy, ecology, size, and conservation fields

The scraper respects the 10-second crawl-delay and uses a single connection. That means full-catalog runs take a while — but you only need to do it once, and the result is a complete snapshot of FishBase's public data.


Input

{
  "maxItems": 100
}
Field Type Default Description
maxItems integer 10 Maximum number of species to scrape. Leave at 0 for the full 35,000+ catalog.

FishBase Scraper Output Fields

{
  "scientific_name": "Carcharodon carcharias",
  "genus": "Carcharodon",
  "species": "carcharias",
  "common_name": "Great white shark",
  "family": "Lamnidae",
  "order": "Lamniformes",
  "environment": "Marine; brackish; pelagic-oceanic; oceanodromous",
  "climate_zone": "Subtropical",
  "distribution": "Cosmopolitan in all tropical and temperate seas. Most common in the southern oceans.",
  "max_length_cm": "720.0",
  "common_length_cm": "488.0",
  "max_weight_g": "3324000",
  "max_age_years": "73",
  "trophic_level": "4.5",
  "iucn_status": "Vulnerable",
  "is_game_fish": "yes",
  "is_dangerous": "Traumatogenic",
  "depth_range_m": "0 - 1280",
  "source_url": "https://www.fishbase.se/summary/751",
  "scraped_at": "2026-05-27T03:14:22.000Z"
}
Field Type Description
scientific_name string Full binomial scientific name (Genus species).
genus string Genus name.
species string Species epithet.
common_name string Primary English common name.
family string Taxonomic family.
order string Taxonomic order.
environment string Habitat descriptors, semicolon-separated (e.g. Marine; brackish; benthopelagic).
climate_zone string Climate zone (Tropical, Temperate, Polar, Boreal, Subtropical, Deep-water).
distribution string Geographic distribution text.
max_length_cm string Maximum recorded length in cm.
common_length_cm string Common or typical length in cm.
max_weight_g string Maximum recorded weight in grams.
max_age_years string Maximum recorded age in years.
trophic_level string Trophic level (e.g. 3.4).
iucn_status string IUCN Red List status (Least Concern, Vulnerable, Endangered, etc.).
is_game_fish string yes or no — whether listed as a game/sport fish.
is_dangerous string Danger classification (Harmless, Traumatogenic, Venomous, etc.).
depth_range_m string Depth range in meters (e.g. 0 - 364).
source_url string Source URL on FishBase.
scraped_at string ISO 8601 timestamp of the scrape.

FAQ

How do I scrape FishBase species data?

FishBase Scraper fetches the ValidNameList.php index, parses every species ID and scientific name from the embedded JSON, then walks one detail page per species. No API key, no proxy, no anti-bot to clear — just polite scraping at the rate the site asks for.

How many species can I get?

FishBase Scraper covers 35,000+ validated species — every entry in the public FishBase catalog. Set maxItems to 0 to pull the full snapshot, or pass a number to sample.

How much does the FishBase Scraper cost to run?

FishBase Scraper is priced per record returned via the pay-per-event model. The full catalog is large enough that most users only run it once and cache the result.

Does this need proxies?

FishBase Scraper does not need proxies. FishBase serves public data without anti-bot protection — datacenter IPs work fine. The scraper observes the site's 10-second crawl-delay and runs single-threaded out of respect for a small academic resource.

Why is a full-catalog run slow?

FishBase Scraper crawls one species at a time with a 10-second delay between requests, per the site's robots.txt. That's not a limit you can tune — it's the rate the site asks for. The good news is the database is fairly stable, so a single full run gives you usable data for months.


Need More Features?

Need additional FishBase fields (genetic data, food items, predator lists), region-filtered crawls, or incremental updates? File an issue or get in touch.

Why Use the FishBase Scraper?

  • Comprehensive coverage — All 35,000+ validated species, every taxonomic rank, every ecology field FishBase exposes
  • Clean schema — Normalized field names and consistent types across taxonomy, size, and conservation data. Drop it straight into a Pandas frame.
  • Polite by default — Respects the robots.txt crawl-delay so the source site stays healthy. Useful when you'd rather not have a community resource throttle you.