CNSA English News Scraper

Scrape the official English-language news and announcements published by the China National Space Administration (CNSA) at cnsa.gov.cn/english.

CNSA's English mirror is the highest-authority English-language source for Chinese space agency news — cited by Reuters, BBC, AP, and SpaceNews. This actor collects full article text, publish dates, images, and attachment links across all five English subchannels: News, Policies & Announcements, Intergovernmental Cooperation, International Cooperation Coordinate Commission, and Special Information.

What you get

Each scraped record contains:

Field	Description
`articleId`	Unique numeric article ID from the CNSA CMS URL
`subchannel`	Subchannel name (News, Policies and Announcement, etc.)
`title`	Full article title
`bodyHtml`	Article body as raw HTML
`bodyText`	Article body as plain text
`publishDate`	Publish date in MM/DD/YYYY format
`sourceUrl`	Canonical URL of the article detail page
`mirrorZhUrl`	Chinese-language counterpart URL (always `null` — not exposed by the English CMS)
`images`	Comma-separated absolute URLs of all images in the article body
`attachments`	Comma-separated absolute URLs of any PDF/document attachments
`scrapedAt`	ISO-8601 timestamp when the record was scraped

How it works

The actor crawls three levels:

Index — Seeds five subchannel listing pages (News, Policies, Cooperation, etc.)
Listing — Extracts article links from each listing page. Discovers all pagination pages from the embedded JavaScript (maxPageNum) and enqueues them automatically.
Article — Fetches each article detail page and extracts title, body HTML/text, date, images, and attachment links.

External links (CGTN, China Daily) that appear in the listing are skipped — only articles hosted on cnsa.gov.cn are scraped.

Usage

Set Max Items to limit how many articles to collect. Leave it at the default (10) for a quick sample, or increase it to collect the full archive (~500 English articles).

Example input

{
  "maxItems": 50
}

Example output record

{
  "articleId": "10743249",
  "subchannel": "News",
  "title": "Chinese scientists discover two new lunar minerals",
  "bodyHtml": "<p>Chinese scientists recently discovered...</p>",
  "bodyText": "Chinese scientists recently discovered two new lunar minerals...",
  "publishDate": "04/24/2026",
  "sourceUrl": "https://www.cnsa.gov.cn/english/n6465652/n6465653/c10743249/content.html",
  "mirrorZhUrl": null,
  "images": "https://www.cnsa.gov.cn/english/n6465652/n6465653/c10743249/part/10743247.jpg",
  "attachments": null,
  "scrapedAt": "2026-05-31T08:14:23.000Z"
}

Notes

The site does not require a proxy — direct datacenter egress works reliably.
Some listing items link to external publications (China Daily, Xinhua) rather than CNSA-hosted articles. These are filtered out automatically.
The mirrorZhUrl field is always null — CNSA's English CMS does not expose cross-links to the Chinese counterpart articles.
Coverage: approximately 500 English-translated articles across all five subchannels as of mid-2026.

CNSA English News Scraper

CNSA English News Scraper

What you get

How it works

Usage

Example input

Example output record

Notes

Related Government & Regulatory scrapers