OrbTop

NYT Cooking Recipe Scraper

AIDEVELOPER TOOLS

NYT Cooking Recipe Scraper

Enumerate the complete NYT Cooking recipe catalog (~25K recipes) from the official sitemap and extract structured recipe data from the public schema.org Recipe JSON-LD embedded in each page.

What it collects

Every record contains the following fields:

Field Type Description
recipe_id string Unique NYT Cooking recipe identifier
url string Canonical recipe URL
name string Recipe title
author string NYT Cooking contributor byline
description string Recipe description / headnote
recipe_yield string Serving size (e.g. "4 servings")
total_time string Total cooking time (e.g. "1 hr 30 min")
prep_time string Preparation time
cook_time string Active cooking time
recipe_category string Meal category (e.g. "Dinner, Main Course")
recipe_cuisine string Cuisine style (e.g. "Mediterranean Inspired")
recipe_ingredient array List of ingredient strings with quantities
recipe_instructions array Step-by-step instructions
nutrition string JSON-serialized nutrition facts (calories, fat, carbs, protein, sodium, etc.) from schema.org NutritionInformation. null for recipes without nutrition data.
aggregate_rating number Average user rating (1–5 scale)
rating_count integer Number of user ratings
keywords array Tags and keywords (ingredient highlights, technique, difficulty, etc.)
image_urls array Full-resolution image URLs
date_published string ISO 8601 publication date

Discovery

By default the actor walks the official NYT Cooking sitemap index (https://www.nytimes.com/sitemaps/new/cooking.xml.gz), which contains monthly sub-sitemaps covering the full recipe inventory. Only /recipes/ paths are collected — article and guide pages are excluded.

Inputs

Input Type Default Description
maxItems integer 10 Maximum number of recipes to collect. Set to 0 for no limit (full catalog run).
startUrls array Optional list of specific NYT Cooking recipe URLs to scrape directly, bypassing sitemap discovery. Useful for targeted single-recipe or small-batch runs.

Data source

All data is extracted from the schema.org/Recipe JSON-LD markup that NYT Cooking embeds in every public recipe page for SEO purposes. Recipe content — including ingredients, instructions, and metadata — is publicly available. The NYT Cooking paywall only gates account-specific features (recipe box, personal notes, collections) and does not restrict access to recipe markup.

Usage notes

  • For a full catalog run (~25K recipes), use maxItems: 0 and allow sufficient run time.
  • Nutrition data (nutrition field) is present on most recipes but absent on some recently published ones; the field is null in those cases.
  • The sitemap updates frequently (new recipes appear within hours of publication). Re-running with maxItems: 0 against the latest sub-sitemaps will catch additions.