BBC Good Food Recipe Scraper
BBC Good Food Recipe Scraper
Overview
The BBC Good Food Recipe Scraper enumerates and extracts the full BBC Good Food recipe catalogue (~15,000+ recipes) using sitemap discovery. It captures rich structured data from each recipe page including ingredients, step-by-step instructions, the UK nutrition panel, BBC-specific skill levels, dietary tags, star ratings, and schema.org/Recipe JSON-LD fields.
BBC Good Food is the largest free English-language recipe authority in the UK, with content covering everything from quick weeknight dinners to elaborate celebration cakes. Unlike generic multi-site scrapers that require you to supply URLs and drop BBC-specific fields, this actor discovers the entire corpus automatically and extracts every structured field the site provides.
Features
- Full sitemap enumeration: Walks the BBC Good Food sitemap index and collects every recipe URL across all quarterly recipe sitemaps (~15K+ recipes).
- BYO URL mode: Supply specific recipe URLs via
startUrlsto scrape targeted recipes without a full crawl. - schema.org/Recipe extraction: Parses the embedded JSON-LD block on each page for all standard Recipe fields.
- BBC-specific fields: Extracts skill level (Easy / More effort / A challenge), dietary tags (vegetarian, vegan, gluten-free, healthy, etc.), and the UK nutrition panel.
- Respectful crawling: Honours the site's crawl-delay directive with conservative concurrency.
- Incremental-friendly: Use
maxItemsto cap run size for incremental update workflows.
Use Cases
- Building recipe datasets for LLM fine-tuning or RAG pipelines.
- Meal planning and nutrition app data ingestion.
- Food-trend analytics using BBC's categorisation taxonomy and editorial dietary tags.
- Competitive benchmarking for recipe content platforms.
- Academic research on UK food culture and cooking trends.
How It Works
- Sitemap discovery: Fetches
https://www.bbcgoodfood.com/sitemap.xml(a 260-child index) and filters to recipe-type sitemaps (e.g.2026-Q2-recipe.xml). - URL collection: Extracts all
/recipes/<slug>URLs from matching sitemaps, capped atmaxItems. - Page extraction: Fetches each recipe page and parses the
schema.org/RecipeJSON-LD block plus supplemental BBC DOM fields. - Output: Stores one record per recipe in the Apify dataset.
Input
| Field | Type | Required | Description |
|---|---|---|---|
maxItems |
Integer | Yes | Maximum number of recipes to scrape. Set to 0 for the full corpus (15K+). Default: 10. |
startUrls |
Array | No | Specific BBC Good Food recipe URLs to scrape. Skips sitemap discovery when provided. |
Example — Full sitemap run (capped)
{
"maxItems": 500
}
Example — BYO URLs
{
"startUrls": [
{ "url": "https://www.bbcgoodfood.com/recipes/easy-chocolate-cake" },
{ "url": "https://www.bbcgoodfood.com/recipes/iced-tea" }
],
"maxItems": 10
}
Output
One record per recipe. All fields sourced from schema.org/Recipe JSON-LD unless noted.
| Field | Type | Description |
|---|---|---|
slug |
String | URL slug (e.g. easy-chocolate-cake) |
url |
String | Full recipe page URL |
name |
String | Recipe title |
author |
String | Recipe author name |
description |
String | Short editorial description |
recipe_category |
String | Category (e.g. Cake, Dinner, Drink) |
recipe_cuisine |
String | Cuisine type (e.g. British, Italian) |
recipe_yield |
String | Serving yield (e.g. "Serves 8") |
prep_time |
String | Prep time as ISO 8601 duration (e.g. PT20M) |
cook_time |
String | Cook time as ISO 8601 duration |
total_time |
String | Total time as ISO 8601 duration |
skill_level |
String | BBC skill rating: Easy / More effort / A challenge |
recipe_ingredient |
Array | List of ingredient strings |
recipe_instructions |
Array | List of step-by-step instruction strings |
nutrition |
String | JSON-encoded per-serving nutrition data (kcal, fat, saturates, carbs, sugars, fibre, protein, salt) |
aggregate_rating |
Number | Average star rating (1–5 scale) |
rating_count |
Integer | Number of ratings |
keywords |
Array | Editorial keyword tags |
dietary_tags |
Array | Dietary suitability tags (vegetarian, vegan, gluten-free, healthy, etc.) |
image_urls |
Array | Recipe image URLs |
date_published |
String | Publication date (ISO 8601) |
Example output record
{
"slug": "easy-chocolate-cake",
"url": "https://www.bbcgoodfood.com/recipes/easy-chocolate-cake",
"name": "Easy chocolate cake",
"author": "Miriam Nice",
"description": "Master the chocolate cake with an airy, light sponge and rich buttercream filling...",
"recipe_category": "Cake",
"recipe_cuisine": "",
"recipe_yield": "Serves 8-10",
"prep_time": "PT30M",
"cook_time": "PT25M",
"total_time": "PT55M",
"skill_level": "Easy",
"recipe_ingredient": [
"225g unsalted butter, softened",
"225g golden caster sugar",
"4 large eggs"
],
"recipe_instructions": [
"Heat oven to 190C/170C fan/gas 5. Butter two 20cm sandwich tins...",
"Beat 225g softened unsalted butter and 225g golden caster sugar until fluffy..."
],
"nutrition": "{\"calories\":\"546 calories\",\"fatContent\":\"31 grams fat\",\"saturatedFatContent\":\"19 grams saturated fat\",\"carbohydrateContent\":\"63 grams carbohydrates\",\"sugarContent\":\"51 grams sugar\",\"fiberContent\":\"1 grams fiber\",\"proteinContent\":\"5 grams protein\",\"sodiumContent\":\"0.5 milligram of sodium\"}",
"aggregate_rating": 4.7,
"rating_count": 2314,
"keywords": ["Afternoon tea", "Celebration cake", "Chocolate cake"],
"dietary_tags": [],
"image_urls": ["https://images.immediate.co.uk/production/volatile/sites/30/2020/08/easy_chocolate_cake-b62f92c.jpg?resize=440,230"],
"date_published": "2020-08-21T00:00:00+00:00"
}
Notes
- Crawl-delay: BBC Good Food's
robots.txtspecifies a 12-second crawl delay. The actor respects this via low concurrency. Full-corpus runs (~15K recipes) will take several hours. - New recipes: The sitemap is indexed quarterly (e.g.
2026-Q2-recipe.xml). Run periodically to capture newly published recipes. - Ratings on new recipes: Freshly published recipes may have no aggregate rating yet —
aggregate_ratingandrating_countwill benull. - Nutrition format: The
nutritionfield is a JSON string. Parse it withJSON.parse(record.nutrition)to access individual nutrients.