Food.com Recipe Scraper
AIDEVELOPER TOOLS
Food.com Recipe Scraper
Scrape recipes from Food.com — one of the largest English-language community recipe databases with over 500,000 recipes including ratings, reviews, and a rich tag taxonomy.
What it does
This actor enumerates Food.com's full sitemap (or accepts a direct list of recipe URLs) and extracts structured recipe data from each page. All core fields come from the embedded schema.org/Recipe JSON-LD block, supplemented with DOM extraction for Food.com-specific data (tag taxonomy, rating details, image gallery).
Use cases
- Build recommender training datasets (ratings + review counts at 500K+ scale)
- Meal-plan and recipe-app databases
- Food trend analytics and NLP corpora
- RAG pipelines for culinary applications
- Competitive ingredient and nutritional analysis
Input
| Field | Type | Description |
|---|---|---|
maxItems |
integer | Maximum number of recipes to scrape. Set to 0 for the full ~500K corpus. Default: 10 |
recipeUrls |
array | Optional list of specific Food.com recipe URLs to scrape. If provided, sitemap enumeration is skipped. |
Example: specific URLs
{
"maxItems": 5,
"recipeUrls": [
{ "url": "https://www.food.com/recipe/jo-mamas-world-famous-spaghetti-22782" },
{ "url": "https://www.food.com/recipe/easy-homemade-chicken-soup-157877" }
]
}
Example: sitemap enumeration (first 1000 recipes)
{
"maxItems": 1000
}
Output
Each record in the dataset corresponds to one recipe:
| Field | Type | Description |
|---|---|---|
recipe_id |
string | Unique numeric recipe ID from the URL |
url |
string | Canonical recipe URL |
name |
string | Recipe name |
author |
string | Recipe author username |
description |
string | Full recipe description |
recipe_category |
string | Primary category (e.g. Dessert, Main Dish) |
recipe_cuisine |
string | Cuisine type if specified (e.g. Italian) |
prep_time |
string | Preparation time in ISO 8601 format (e.g. PT15M) |
cook_time |
string | Cook time in ISO 8601 format |
total_time |
string | Total time in ISO 8601 format |
recipe_yield |
string | Servings (e.g. "4 serving(s)") |
recipe_ingredient |
array | Ingredients as formatted strings |
recipe_instructions |
array | Step-by-step instruction strings |
nutrition |
object | Nutritional data: calories, fat_content, saturated_fat, cholesterol, sodium, carbohydrate, fiber, sugar, protein |
aggregate_rating |
number | Average star rating (0-5 scale) |
rating_count |
integer | Total number of ratings |
review_count |
integer | Total number of written reviews |
keywords |
string | Comma-separated keywords (occasion, diet, method tags) |
tags |
array | Food.com topic taxonomy tags |
image_urls |
array | Recipe photo URLs |
date_published |
string | Publication date (ISO 8601) |
Sample output record
{
"recipe_id": "22782",
"url": "https://www.food.com/recipe/jo-mamas-world-famous-spaghetti-22782",
"name": "Jo Mama's World Famous Spaghetti",
"author": "Sharlene~W",
"description": "My kids will give up a steak dinner for this spaghetti...",
"recipe_category": "Spaghetti",
"recipe_cuisine": null,
"prep_time": "PT20M",
"cook_time": "PT1H",
"total_time": "PT1H20M",
"recipe_yield": "4 quarts, 10-14 serving(s)",
"recipe_ingredient": ["2 lbs Italian sausage, casings removed", "..."],
"recipe_instructions": ["In large, heavy stockpot, brown Italian sausage...", "..."],
"nutrition": {
"calories": "555.9",
"fat_content": "26.3",
"protein": "29.8"
},
"aggregate_rating": 5.0,
"rating_count": 1376,
"review_count": 1376,
"keywords": "Pork,Meat,European,Kid Friendly,Weeknight,Stove Top,< 4 Hours,Easy",
"tags": ["Spaghetti"],
"image_urls": ["https://img.sndimg.com/food/image/upload/..."],
"date_published": "2002-03-17T10:26Z"
}
Crawl approach
- Sitemap enumeration: Fetches
https://www.food.com/sitemap.xml(a 24-child sitemap index with gzip-compressed child files) and collects all/recipe/URLs. - Page scraping: Each recipe page is fetched and parsed via the embedded
schema.org/RecipeJSON-LD block for structured data, plus DOM extraction for Food.com-specific taxonomy and image gallery. - Rate limiting: Automatic rate-limit detection and backoff — no manual configuration needed.
Performance
- Memory: 512 MB
- No proxy required — Food.com datacenter access is open
- Concurrency: 10 parallel requests
- Full corpus (~500K recipes): runs over the default 4-hour timeout