PR Newswire Scraper
PR Newswire Scraper
Scrape press releases from PR Newswire — the world's largest press release distribution network.
For each release, the actor collects:
- Headline — the full press release title
- Publication date — ISO 8601 datetime with timezone offset
- Author / company — the organization that issued the release
- Full body text — the complete text of the press release
- Canonical URL — permanent link to the release page
- Release ID — the unique numeric identifier from the URL slug
Usage
By default, the actor scrapes the All News Releases listing from newest to oldest.
To scrape a specific category listing page, pass one or more URLs via startUrls:
{
"maxItems": 100,
"startUrls": [
{ "url": "https://www.prnewswire.com/news-releases/financial-services-latest-news/" },
{ "url": "https://www.prnewswire.com/news-releases/life-sciences-latest-news/" }
]
}
Any PR Newswire category listing page that uses the /news-releases/<category>-latest-news/ URL pattern is supported.
Input
| Field | Type | Default | Description |
|---|---|---|---|
maxItems |
integer | 10 | Maximum number of press releases to collect. |
startUrls |
array | (all releases) | One or more PR Newswire category listing URLs. |
Output
Each output record (saved to the dataset) includes:
| Field | Type | Description |
|---|---|---|
release_id |
string | Numeric ID from the URL slug (e.g. 302793395) |
title |
string | Press release headline |
url |
string | Canonical URL |
published_at |
string | ISO 8601 datetime (e.g. 2026-06-07T14:30:00-04:00) |
author |
string | Publishing company or author name |
summary |
string | Short excerpt (when available from listing) |
full_text |
string | Full body text of the press release |
scraped_at |
string | ISO 8601 scrape timestamp |
Performance
- Proxy: None required — PR Newswire is directly accessible.
- Concurrency: 5 parallel requests (polite default).
- Coverage: Paginated scraping — all pages of a category listing are crawled.
- Rate limiting: Automatic backoff on HTTP 429 responses.
Technical Notes
PR Newswire serves server-rendered HTML across all listing and detail pages. No JavaScript rendering or CAPTCHA solving is required. The actor uses Cheerio for fast HTML parsing and respects pagination via ?page=N query parameters.
The search feature on prnewswire.com is JavaScript-powered and not accessible to a plain HTTP crawler — use category listing URLs instead.