OrbTop

TMDB Movie & TV Metadata Scraper

VIDEOSDEVELOPER TOOLS

TMDB Movie & TV Metadata Scraper

Scrape rich metadata for movies and TV shows from The Movie Database (TMDB)no API key required. Discovers titles from TMDB's public discover/browse pages and extracts full detail records including cast, directors, genres, keywords, ratings, runtime, original language, and production companies.

What you get

Each record in the output dataset contains:

Field Description
tmdb_id Numeric TMDB ID
title Movie or TV show title
media_type movie or tv
tmdb_url Canonical TMDB page URL
original_title Title in the original language
release_date First release or air date (YYYY-MM-DD)
vote_average Aggregate rating (0–10 scale)
vote_count Number of votes
user_score_percent User score percentage (0–100)
overview Plot summary or show description
genres Comma-separated genre names
runtime_minutes Runtime in minutes
original_language Original language
production_companies Comma-separated production company names
imdb_id IMDb ID (e.g. tt0137523) — when listed on the TMDB page
cast_top Comma-separated top-billed cast names
directors Comma-separated director names
keywords Comma-separated TMDB keyword tags

Why use this instead of the TMDB API?

Most TMDB scrapers on the Apify Store are thin wrappers around the TMDB REST API — they require you to register for and manage your own API key and stay within TMDB's per-account rate limits. This actor scrapes TMDB's public web pages directly, so:

  • No API key registration or management
  • No per-account rate limits to worry about
  • Both movies and TV shows in one unified output schema
  • Includes fields not always easily queryable via API (IMDb cross-ID, keyword tags, top cast)

Inputs

Input Type Default Description
maxItems integer 15 Maximum number of records to return. Set to 0 for no limit.
mediaType string both Which media type to scrape: movie, tv, or both.
startPage integer 1 Discover page to start from (each page has ~20 titles).

Example use cases

  • Media server catalogs: Build or enrich metadata catalogs for Plex, Jellyfin, or Kodi libraries without managing API credentials.
  • Recommendation engines: Feed movie/TV metadata into ML pipelines — genres, keywords, cast, and ratings in one schema.
  • Cross-referencing: Use imdb_id to join TMDB data with IMDb datasets for enriched analytics.
  • Market research: Track ratings and popularity trends across the TMDB catalog over time.

How it works

  1. Discover: Crawls paginated TMDB browse pages (/movie?language=en-US&page=N, /tv?...) — 20 titles per page, up to 500 pages per type.
  2. Detail: For each title, fetches the detail page and extracts:
    • JSON-LD (schema.org Movie / TVSeries): name, description, rating, genres, runtime, release date
    • DOM: user score chart, directors, cast, keywords, original title, language, production companies

Notes

  • TMDB's discover pages order titles by popularity (most popular first). Use startPage to offset into the catalog.
  • The imdb_id field is populated only when TMDB links to IMDb on the detail page — this is common for well-known titles but may be absent for obscure entries.
  • Runtime is in minutes for movies. For TV shows, TMDB typically reports the average episode length.
  • The language=en-US parameter is appended to all requests to ensure English metadata in the output.