NYT Cooking Recipe Scraper
NYT Cooking Recipe Scraper
Enumerate the complete NYT Cooking recipe catalog (~25K recipes) from the official sitemap and extract structured recipe data from the public schema.org Recipe JSON-LD embedded in each page.
What it collects
Every record contains the following fields:
| Field | Type | Description |
|---|---|---|
recipe_id |
string | Unique NYT Cooking recipe identifier |
url |
string | Canonical recipe URL |
name |
string | Recipe title |
author |
string | NYT Cooking contributor byline |
description |
string | Recipe description / headnote |
recipe_yield |
string | Serving size (e.g. "4 servings") |
total_time |
string | Total cooking time (e.g. "1 hr 30 min") |
prep_time |
string | Preparation time |
cook_time |
string | Active cooking time |
recipe_category |
string | Meal category (e.g. "Dinner, Main Course") |
recipe_cuisine |
string | Cuisine style (e.g. "Mediterranean Inspired") |
recipe_ingredient |
array | List of ingredient strings with quantities |
recipe_instructions |
array | Step-by-step instructions |
nutrition |
string | JSON-serialized nutrition facts (calories, fat, carbs, protein, sodium, etc.) from schema.org NutritionInformation. null for recipes without nutrition data. |
aggregate_rating |
number | Average user rating (1–5 scale) |
rating_count |
integer | Number of user ratings |
keywords |
array | Tags and keywords (ingredient highlights, technique, difficulty, etc.) |
image_urls |
array | Full-resolution image URLs |
date_published |
string | ISO 8601 publication date |
Discovery
By default the actor walks the official NYT Cooking sitemap index (https://www.nytimes.com/sitemaps/new/cooking.xml.gz), which contains monthly sub-sitemaps covering the full recipe inventory. Only /recipes/ paths are collected — article and guide pages are excluded.
Inputs
| Input | Type | Default | Description |
|---|---|---|---|
maxItems |
integer | 10 | Maximum number of recipes to collect. Set to 0 for no limit (full catalog run). |
startUrls |
array | — | Optional list of specific NYT Cooking recipe URLs to scrape directly, bypassing sitemap discovery. Useful for targeted single-recipe or small-batch runs. |
Data source
All data is extracted from the schema.org/Recipe JSON-LD markup that NYT Cooking embeds in every public recipe page for SEO purposes. Recipe content — including ingredients, instructions, and metadata — is publicly available. The NYT Cooking paywall only gates account-specific features (recipe box, personal notes, collections) and does not restrict access to recipe markup.
Usage notes
- For a full catalog run (~25K recipes), use
maxItems: 0and allow sufficient run time. - Nutrition data (
nutritionfield) is present on most recipes but absent on some recently published ones; the field isnullin those cases. - The sitemap updates frequently (new recipes appear within hours of publication). Re-running with
maxItems: 0against the latest sub-sitemaps will catch additions.