OrbTop

PR Newswire Scraper

NEWSAUTOMATION

PR Newswire Scraper

Scrape press releases from PR Newswire — the world's largest press release distribution network.

For each release, the actor collects:

  • Headline — the full press release title
  • Publication date — ISO 8601 datetime with timezone offset
  • Author / company — the organization that issued the release
  • Full body text — the complete text of the press release
  • Canonical URL — permanent link to the release page
  • Release ID — the unique numeric identifier from the URL slug

Usage

By default, the actor scrapes the All News Releases listing from newest to oldest.

To scrape a specific category listing page, pass one or more URLs via startUrls:

{
    "maxItems": 100,
    "startUrls": [
        { "url": "https://www.prnewswire.com/news-releases/financial-services-latest-news/" },
        { "url": "https://www.prnewswire.com/news-releases/life-sciences-latest-news/" }
    ]
}

Any PR Newswire category listing page that uses the /news-releases/<category>-latest-news/ URL pattern is supported.

Input

Field Type Default Description
maxItems integer 10 Maximum number of press releases to collect.
startUrls array (all releases) One or more PR Newswire category listing URLs.

Output

Each output record (saved to the dataset) includes:

Field Type Description
release_id string Numeric ID from the URL slug (e.g. 302793395)
title string Press release headline
url string Canonical URL
published_at string ISO 8601 datetime (e.g. 2026-06-07T14:30:00-04:00)
author string Publishing company or author name
summary string Short excerpt (when available from listing)
full_text string Full body text of the press release
scraped_at string ISO 8601 scrape timestamp

Performance

  • Proxy: None required — PR Newswire is directly accessible.
  • Concurrency: 5 parallel requests (polite default).
  • Coverage: Paginated scraping — all pages of a category listing are crawled.
  • Rate limiting: Automatic backoff on HTTP 429 responses.

Technical Notes

PR Newswire serves server-rendered HTML across all listing and detail pages. No JavaScript rendering or CAPTCHA solving is required. The actor uses Cheerio for fast HTML parsing and respects pagination via ?page=N query parameters.

The search feature on prnewswire.com is JavaScript-powered and not accessible to a plain HTTP crawler — use category listing URLs instead.