NYT Cooking Recipe Scraper

Enumerate the complete NYT Cooking recipe catalog (~25K recipes) from the official sitemap and extract structured recipe data from the public schema.org Recipe JSON-LD embedded in each page.

What it collects

Every record contains the following fields:

Field	Type	Description
`recipe_id`	string	Unique NYT Cooking recipe identifier
`url`	string	Canonical recipe URL
`name`	string	Recipe title
`author`	string	NYT Cooking contributor byline
`description`	string	Recipe description / headnote
`recipe_yield`	string	Serving size (e.g. "4 servings")
`total_time`	string	Total cooking time (e.g. "1 hr 30 min")
`prep_time`	string	Preparation time
`cook_time`	string	Active cooking time
`recipe_category`	string	Meal category (e.g. "Dinner, Main Course")
`recipe_cuisine`	string	Cuisine style (e.g. "Mediterranean Inspired")
`recipe_ingredient`	array	List of ingredient strings with quantities
`recipe_instructions`	array	Step-by-step instructions
`nutrition`	string	JSON-serialized nutrition facts (calories, fat, carbs, protein, sodium, etc.) from schema.org NutritionInformation. `null` for recipes without nutrition data.
`aggregate_rating`	number	Average user rating (1–5 scale)
`rating_count`	integer	Number of user ratings
`keywords`	array	Tags and keywords (ingredient highlights, technique, difficulty, etc.)
`image_urls`	array	Full-resolution image URLs
`date_published`	string	ISO 8601 publication date

Discovery

By default the actor walks the official NYT Cooking sitemap index (https://www.nytimes.com/sitemaps/new/cooking.xml.gz), which contains monthly sub-sitemaps covering the full recipe inventory. Only /recipes/ paths are collected — article and guide pages are excluded.

Inputs

Input	Type	Default	Description
`maxItems`	integer	10	Maximum number of recipes to collect. Set to 0 for no limit (full catalog run).
`startUrls`	array	—	Optional list of specific NYT Cooking recipe URLs to scrape directly, bypassing sitemap discovery. Useful for targeted single-recipe or small-batch runs.

Data source

All data is extracted from the schema.org/Recipe JSON-LD markup that NYT Cooking embeds in every public recipe page for SEO purposes. Recipe content — including ingredients, instructions, and metadata — is publicly available. The NYT Cooking paywall only gates account-specific features (recipe box, personal notes, collections) and does not restrict access to recipe markup.

Usage notes

For a full catalog run (~25K recipes), use maxItems: 0 and allow sufficient run time.
Nutrition data (nutrition field) is present on most recipes but absent on some recently published ones; the field is null in those cases.
The sitemap updates frequently (new recipes appear within hours of publication). Re-running with maxItems: 0 against the latest sub-sitemaps will catch additions.

Further reading: ISBN Database Access and Other Open Reference Data in Bulk

NYT Cooking Recipe Scraper

NYT Cooking Recipe Scraper

What it collects

Discovery

Inputs

Data source

Usage notes

Featured in

Related AI & Data scrapers