OrbTop

BBC Good Food Recipe Scraper

AIDEVELOPER TOOLS

BBC Good Food Recipe Scraper

Overview

The BBC Good Food Recipe Scraper enumerates and extracts the full BBC Good Food recipe catalogue (~15,000+ recipes) using sitemap discovery. It captures rich structured data from each recipe page including ingredients, step-by-step instructions, the UK nutrition panel, BBC-specific skill levels, dietary tags, star ratings, and schema.org/Recipe JSON-LD fields.

BBC Good Food is the largest free English-language recipe authority in the UK, with content covering everything from quick weeknight dinners to elaborate celebration cakes. Unlike generic multi-site scrapers that require you to supply URLs and drop BBC-specific fields, this actor discovers the entire corpus automatically and extracts every structured field the site provides.

Features

  • Full sitemap enumeration: Walks the BBC Good Food sitemap index and collects every recipe URL across all quarterly recipe sitemaps (~15K+ recipes).
  • BYO URL mode: Supply specific recipe URLs via startUrls to scrape targeted recipes without a full crawl.
  • schema.org/Recipe extraction: Parses the embedded JSON-LD block on each page for all standard Recipe fields.
  • BBC-specific fields: Extracts skill level (Easy / More effort / A challenge), dietary tags (vegetarian, vegan, gluten-free, healthy, etc.), and the UK nutrition panel.
  • Respectful crawling: Honours the site's crawl-delay directive with conservative concurrency.
  • Incremental-friendly: Use maxItems to cap run size for incremental update workflows.

Use Cases

  • Building recipe datasets for LLM fine-tuning or RAG pipelines.
  • Meal planning and nutrition app data ingestion.
  • Food-trend analytics using BBC's categorisation taxonomy and editorial dietary tags.
  • Competitive benchmarking for recipe content platforms.
  • Academic research on UK food culture and cooking trends.

How It Works

  1. Sitemap discovery: Fetches https://www.bbcgoodfood.com/sitemap.xml (a 260-child index) and filters to recipe-type sitemaps (e.g. 2026-Q2-recipe.xml).
  2. URL collection: Extracts all /recipes/<slug> URLs from matching sitemaps, capped at maxItems.
  3. Page extraction: Fetches each recipe page and parses the schema.org/Recipe JSON-LD block plus supplemental BBC DOM fields.
  4. Output: Stores one record per recipe in the Apify dataset.

Input

Field Type Required Description
maxItems Integer Yes Maximum number of recipes to scrape. Set to 0 for the full corpus (15K+). Default: 10.
startUrls Array No Specific BBC Good Food recipe URLs to scrape. Skips sitemap discovery when provided.

Example — Full sitemap run (capped)

{
  "maxItems": 500
}

Example — BYO URLs

{
  "startUrls": [
    { "url": "https://www.bbcgoodfood.com/recipes/easy-chocolate-cake" },
    { "url": "https://www.bbcgoodfood.com/recipes/iced-tea" }
  ],
  "maxItems": 10
}

Output

One record per recipe. All fields sourced from schema.org/Recipe JSON-LD unless noted.

Field Type Description
slug String URL slug (e.g. easy-chocolate-cake)
url String Full recipe page URL
name String Recipe title
author String Recipe author name
description String Short editorial description
recipe_category String Category (e.g. Cake, Dinner, Drink)
recipe_cuisine String Cuisine type (e.g. British, Italian)
recipe_yield String Serving yield (e.g. "Serves 8")
prep_time String Prep time as ISO 8601 duration (e.g. PT20M)
cook_time String Cook time as ISO 8601 duration
total_time String Total time as ISO 8601 duration
skill_level String BBC skill rating: Easy / More effort / A challenge
recipe_ingredient Array List of ingredient strings
recipe_instructions Array List of step-by-step instruction strings
nutrition String JSON-encoded per-serving nutrition data (kcal, fat, saturates, carbs, sugars, fibre, protein, salt)
aggregate_rating Number Average star rating (1–5 scale)
rating_count Integer Number of ratings
keywords Array Editorial keyword tags
dietary_tags Array Dietary suitability tags (vegetarian, vegan, gluten-free, healthy, etc.)
image_urls Array Recipe image URLs
date_published String Publication date (ISO 8601)

Example output record

{
  "slug": "easy-chocolate-cake",
  "url": "https://www.bbcgoodfood.com/recipes/easy-chocolate-cake",
  "name": "Easy chocolate cake",
  "author": "Miriam Nice",
  "description": "Master the chocolate cake with an airy, light sponge and rich buttercream filling...",
  "recipe_category": "Cake",
  "recipe_cuisine": "",
  "recipe_yield": "Serves 8-10",
  "prep_time": "PT30M",
  "cook_time": "PT25M",
  "total_time": "PT55M",
  "skill_level": "Easy",
  "recipe_ingredient": [
    "225g unsalted butter, softened",
    "225g golden caster sugar",
    "4 large eggs"
  ],
  "recipe_instructions": [
    "Heat oven to 190C/170C fan/gas 5. Butter two 20cm sandwich tins...",
    "Beat 225g softened unsalted butter and 225g golden caster sugar until fluffy..."
  ],
  "nutrition": "{\"calories\":\"546 calories\",\"fatContent\":\"31 grams fat\",\"saturatedFatContent\":\"19 grams saturated fat\",\"carbohydrateContent\":\"63 grams carbohydrates\",\"sugarContent\":\"51 grams sugar\",\"fiberContent\":\"1 grams fiber\",\"proteinContent\":\"5 grams protein\",\"sodiumContent\":\"0.5 milligram of sodium\"}",
  "aggregate_rating": 4.7,
  "rating_count": 2314,
  "keywords": ["Afternoon tea", "Celebration cake", "Chocolate cake"],
  "dietary_tags": [],
  "image_urls": ["https://images.immediate.co.uk/production/volatile/sites/30/2020/08/easy_chocolate_cake-b62f92c.jpg?resize=440,230"],
  "date_published": "2020-08-21T00:00:00+00:00"
}

Notes

  • Crawl-delay: BBC Good Food's robots.txt specifies a 12-second crawl delay. The actor respects this via low concurrency. Full-corpus runs (~15K recipes) will take several hours.
  • New recipes: The sitemap is indexed quarterly (e.g. 2026-Q2-recipe.xml). Run periodically to capture newly published recipes.
  • Ratings on new recipes: Freshly published recipes may have no aggregate rating yet — aggregate_rating and rating_count will be null.
  • Nutrition format: The nutrition field is a JSON string. Parse it with JSON.parse(record.nutrition) to access individual nutrients.