Beehiiv Newsletter Scraper

Scrape posts from any beehiiv-powered newsletter. Input a list of publication domains or subdomains — the actor discovers post URLs via sitemap and extracts title, author, publish date, excerpt, cover image, tags, and word count. Supports multi-newsletter fan-out in a single run.

What it does

The actor accepts a list of beehiiv publication domains (e.g. readthepeak.com, discover.beehiiv.com) and for each domain:

Fetches <domain>/sitemap.xml to discover all public post URLs matching the /p/<slug> pattern.
Crawls each post page and extracts structured data from the embedded JSON-LD Article schema.
Yields one record per post with all metadata fields.

Publications that sit behind Cloudflare or other anti-bot measures are gracefully skipped with a warning. Free posts are scraped; paywalled posts (where isAccessibleForFree: false in JSON-LD) are automatically skipped.

Input

Parameter	Type	Description
`domains`	array	List of publication domains. Accepts bare domains (`readthepeak.com`), subdomains (`mybrand.beehiiv.com`), or full URLs (`https://readthepeak.com`).
`maxItems`	integer	Maximum posts to scrape per publication (0 = unlimited). Default: 10.

Example input:

{
  "domains": ["readthepeak.com", "discover.beehiiv.com"],
  "maxItems": 50
}

Output

Each record contains:

Field	Description
`publication_domain`	Input domain (e.g. `readthepeak.com`)
`publication_name`	Newsletter name from JSON-LD publisher
`post_url`	Canonical post URL
`post_title`	Post headline
`post_subtitle`	Post subtitle / description
`author`	Author name
`publish_date`	ISO 8601 publish timestamp
`excerpt`	Short description (up to 300 chars)
`cover_image_url`	Cover image URL
`word_count`	Estimated word count of post body
`tags`	Comma-separated tags
`full_text`	Full post body text (empty unless `include_full_text` is set)
`scraped_at`	ISO 8601 scrape timestamp

Example output record:

{
  "publication_domain": "readthepeak.com",
  "publication_name": "The Peak",
  "post_url": "https://www.readthepeak.com/p/canadian-universities-are-falling-behind",
  "post_title": "Canadian universities are falling behind",
  "post_subtitle": "Canada's post-secondary schools are losing their edge.",
  "author": "Lucas Arender",
  "publish_date": "2026-06-02T10:00:00.000Z",
  "excerpt": "Canada's post-secondary schools are losing their edge.",
  "cover_image_url": "https://beehiiv-images-production.s3.amazonaws.com/...",
  "word_count": 291,
  "tags": "Water Cooler, Perspectives",
  "full_text": "",
  "scraped_at": "2026-06-02T20:39:48.116Z"
}

Limitations

Publications behind Cloudflare or PerimeterX (e.g. some high-traffic custom domains) will return a warning and be skipped. Use a different domain format if the publication has a *.beehiiv.com subdomain that is not CF-walled.
Paywalled posts (subscriber-only) are detected via JSON-LD and automatically skipped.
Publications without a sitemap.xml or with no /p/ posts in their sitemap are skipped.
full_text extraction is best-effort — post body selectors may vary slightly across beehiiv themes.

Beehiiv Newsletter Scraper

Beehiiv Newsletter Scraper

What it does

Input

Output

Limitations

Related Lead Generation scrapers