FishBase Species Database Scraper
FishBase Species Database Scraper
Scrape FishBase, the global reference database for fish species, and extract structured records for 35,000+ species covering taxonomy, ecology, size and weight, trophic level, IUCN Red List status, depth range, and common names. Returns clean JSON per species pulled from the public summary pages.
FishBase Scraper Features
- Indexes 35,000+ fish species — every validated entry in the FishBase catalog
- Extracts full taxonomy: scientific name, genus, species, family, order
- Returns ecology fields: environment, climate zone, distribution, depth range
- Captures size and life-history stats: max length, common length, max weight, max age
- Pulls trophic level, IUCN Red List status, game-fish flag, and dangerousness classification
- Includes primary English common name plus a source URL back to the FishBase summary page
- No API key, no proxy, no captcha — just respectful crawling at the site's robots.txt rate
- Pure HTTP scraping. Hits the
ValidNameListindex page once, then walks/summary/{ID}for each species
Who Uses FishBase Data?
- Marine biologists and ichthyologists — Pull species-level data into research notebooks without scraping each summary page by hand
- Fisheries management agencies — Cross-reference IUCN status and trophic level across stocks for assessment models
- AI training datasets — Build species classification or natural-language Q&A datasets grounded in the canonical reference
- Aquarium and reef hobbyists — Generate compatibility data driven by climate zone, depth range, and dangerousness classification
- Game-fishing apps — Filter on
is_game_fishand species distribution to power location-aware species guides - EdTech and museum apps — Populate species cards with verified taxonomy and conservation data
How the FishBase Scraper Works
- Set maxItems — Pick a sample size or leave it at 0 to crawl all 35,000+ species
- The scraper fetches the ValidNameList index — A single page that lists every species ID and scientific name on the site
- It walks one summary page per species at the polite rate FishBase requests in robots.txt
- Returns one normalized JSON record per species with taxonomy, ecology, size, and conservation fields
The scraper respects the 10-second crawl-delay and uses a single connection. That means full-catalog runs take a while — but you only need to do it once, and the result is a complete snapshot of FishBase's public data.
Input
{
"maxItems": 100
}
| Field | Type | Default | Description |
|---|---|---|---|
| maxItems | integer | 10 | Maximum number of species to scrape. Leave at 0 for the full 35,000+ catalog. |
FishBase Scraper Output Fields
{
"scientific_name": "Carcharodon carcharias",
"genus": "Carcharodon",
"species": "carcharias",
"common_name": "Great white shark",
"family": "Lamnidae",
"order": "Lamniformes",
"environment": "Marine; brackish; pelagic-oceanic; oceanodromous",
"climate_zone": "Subtropical",
"distribution": "Cosmopolitan in all tropical and temperate seas. Most common in the southern oceans.",
"max_length_cm": "720.0",
"common_length_cm": "488.0",
"max_weight_g": "3324000",
"max_age_years": "73",
"trophic_level": "4.5",
"iucn_status": "Vulnerable",
"is_game_fish": "yes",
"is_dangerous": "Traumatogenic",
"depth_range_m": "0 - 1280",
"source_url": "https://www.fishbase.se/summary/751",
"scraped_at": "2026-05-27T03:14:22.000Z"
}
| Field | Type | Description |
|---|---|---|
| scientific_name | string | Full binomial scientific name (Genus species). |
| genus | string | Genus name. |
| species | string | Species epithet. |
| common_name | string | Primary English common name. |
| family | string | Taxonomic family. |
| order | string | Taxonomic order. |
| environment | string | Habitat descriptors, semicolon-separated (e.g. Marine; brackish; benthopelagic). |
| climate_zone | string | Climate zone (Tropical, Temperate, Polar, Boreal, Subtropical, Deep-water). |
| distribution | string | Geographic distribution text. |
| max_length_cm | string | Maximum recorded length in cm. |
| common_length_cm | string | Common or typical length in cm. |
| max_weight_g | string | Maximum recorded weight in grams. |
| max_age_years | string | Maximum recorded age in years. |
| trophic_level | string | Trophic level (e.g. 3.4). |
| iucn_status | string | IUCN Red List status (Least Concern, Vulnerable, Endangered, etc.). |
| is_game_fish | string | yes or no — whether listed as a game/sport fish. |
| is_dangerous | string | Danger classification (Harmless, Traumatogenic, Venomous, etc.). |
| depth_range_m | string | Depth range in meters (e.g. 0 - 364). |
| source_url | string | Source URL on FishBase. |
| scraped_at | string | ISO 8601 timestamp of the scrape. |
FAQ
How do I scrape FishBase species data?
FishBase Scraper fetches the ValidNameList.php index, parses every species ID and scientific name from the embedded JSON, then walks one detail page per species. No API key, no proxy, no anti-bot to clear — just polite scraping at the rate the site asks for.
How many species can I get?
FishBase Scraper covers 35,000+ validated species — every entry in the public FishBase catalog. Set maxItems to 0 to pull the full snapshot, or pass a number to sample.
How much does the FishBase Scraper cost to run?
FishBase Scraper is priced per record returned via the pay-per-event model. The full catalog is large enough that most users only run it once and cache the result.
Does this need proxies?
FishBase Scraper does not need proxies. FishBase serves public data without anti-bot protection — datacenter IPs work fine. The scraper observes the site's 10-second crawl-delay and runs single-threaded out of respect for a small academic resource.
Why is a full-catalog run slow?
FishBase Scraper crawls one species at a time with a 10-second delay between requests, per the site's robots.txt. That's not a limit you can tune — it's the rate the site asks for. The good news is the database is fairly stable, so a single full run gives you usable data for months.
Need More Features?
Need additional FishBase fields (genetic data, food items, predator lists), region-filtered crawls, or incremental updates? File an issue or get in touch.
Why Use the FishBase Scraper?
- Comprehensive coverage — All 35,000+ validated species, every taxonomic rank, every ecology field FishBase exposes
- Clean schema — Normalized field names and consistent types across taxonomy, size, and conservation data. Drop it straight into a Pandas frame.
- Polite by default — Respects the robots.txt crawl-delay so the source site stays healthy. Useful when you'd rather not have a community resource throttle you.