Penguin Random House Publisher Catalog Scraper

Scrape the official Penguin Random House publisher catalog from penguinrandomhouse.com. Extracts authoritative book metadata: title, author, ISBN, imprint, format, publication date, price, description, praise blurbs, and series information — primary-source data not available from consumer review aggregators.

What data does it collect?

Each record is one book edition (one canonical detail page):

Field	Type	Description
`prh_id`	string	Penguin Random House work ID (numeric, from URL)
`title`	string	Book title
`subtitle`	string	Subtitle, if present
`author`	string	Primary author name
`contributors`	string	All contributors as JSON array: `[{"name":"...", "role":"..."}]`
`imprint`	string	Publisher imprint (e.g. Random House, Crown, Dial Press)
`format`	string	Format: Hardcover, Paperback, Ebook, or Audiobook
`isbn`	string	Primary ISBN-13
`pages`	integer	Page count
`publication_date`	string	Publication date (ISO 8601, e.g. `2024-10-01`)
`price`	number	List price in USD
`category`	string	Genre/category as JSON array of strings
`description`	string	Publisher description (about the book)
`about_the_author`	string	Author biography from the publisher
`praise`	string	Praise/endorsement blurbs as JSON array of strings
`series`	string	Series name if the book is part of a series
`related_titles`	string	Related edition ISBNs as JSON array
`cover_url`	string	Cover image URL
`product_url`	string	Full URL of the book detail page

How to use it

Search by keyword

Provide one or more search queries. The scraper paginates through search results, visits each book detail page, and saves the metadata. Queries can be genre names, author names, topics, or any other search terms the PRH catalog supports.

{
  "queries": ["mystery", "science fiction"],
  "maxItems": 50,
  "sp_intended_usage": "catalog research"
}

Small focused run

{
  "queries": ["romance"],
  "maxItems": 10,
  "sp_intended_usage": "spot check"
}

Input parameters

Parameter	Type	Required	Default	Description
`queries`	array	Yes	`["fiction"]`	Search terms to scrape. Each query seeds an independent paginated search.
`maxItems`	integer	Yes	5	Maximum total book records to collect across all queries.

Notes

Extraction uses the structured JSON-LD Book schema embedded on each detail page — the same data the publisher uses for SEO. This gives authoritative isbn, publisherImprint, datePublished, and offers.price without scraping fragile HTML.
Praise blurbs, author bios, and categories are extracted from the HTML where JSON-LD does not carry them.
The contributors, category, praise, and related_titles fields are serialised as JSON strings so they remain compatible with spreadsheet and CSV exports.
The Penguin Random House catalog covers ~120k+ titles across all imprints (Random House, Crown, Knopf, Bantam, Viking, Penguin, and many more).
No proxy required — the catalog is publicly accessible without bot protection.

Penguin Random House Publisher Catalog Scraper

Penguin Random House Publisher Catalog Scraper

What data does it collect?

How to use it

Search by keyword

Small focused run

Input parameters

Notes

Related E-Commerce scrapers