CASC SpaceChina Corporate News Scraper
NEWSBUSINESS
CASC SpaceChina Corporate News Scraper
Scrapes press releases and corporate news from SpaceChina.com — the public news portal of CASC (China Aerospace Science and Technology Corporation / 中国航天科技集团有限公司). Extracts structured articles from three news subchannels with full body text, publication date, image URLs, PDF attachments, and automatic CASC subsidiary detection.
What it collects
| Field | Description |
|---|---|
article_id |
Unique numeric ID from the article URL (e.g. 4632103) |
subchannel |
Source subchannel: 集团要闻, 媒体聚焦, or 专题报道 |
title_zh |
Article title in Chinese |
title_en |
English title (null — future enhancement via english.spacechina.com mirror) |
body_html |
Full article body HTML |
body_text |
Full article body as plain text |
publish_date |
Publication date (ISO 8601, e.g. 2026-06-11) |
source_url |
Canonical article URL |
mentioned_subsidiaries |
CASC academy/subsidiary names detected in body (一院 through 八院, CALT, CAST, SAST) |
images |
Absolute URLs of embedded article images |
attachments |
Absolute URLs of PDF attachments (e.g. annual social-responsibility reports) |
Subchannels covered
| Key | Chinese | English |
|---|---|---|
jtyw |
集团要闻 | Group News — primary launch and operations press releases |
mjjj |
媒体聚焦 | Media Focus — external press coverage aggregated |
ztbd |
专题报道 | Special Reports — themed coverage (missions, events, policy) |
The actor crawls all pages within each selected subchannel, following the site's paginated listing structure automatically.
Input
| Parameter | Type | Required | Description |
|---|---|---|---|
maxItems |
integer | Yes | Maximum number of articles to scrape. Set to a high value (or remove the cap) for a full historical crawl (~3,000+ articles across all subchannels). |
subchannels |
array | Yes | Which subchannels to include. Accepts any combination of jtyw, mjjj, ztbd. Default: all three. |
Example input
{
"maxItems": 100,
"subchannels": ["jtyw"]
}
Use cases
- Defense and aerospace intelligence — Track every CASC press release mentioning specific launch vehicles, academies, or programs.
- ESG / sanctions screening — Identify CASC subsidiaries (一院 through 八院) named in corporate announcements for mil-civ fusion exposure mapping.
- Trade compliance — Monitor export-control-relevant announcements (new satellite programs, foreign partnerships, dual-use technology disclosures).
- Annual reports — The 专题报道 channel carries annual social-responsibility reports back to 2013 as PDF attachments.
- Research and journalism — Build a full-text searchable archive of CASC's public-facing communications.
Notes
- Chinese-language content: All articles are in Simplified Chinese. The
body_textfield is suitable for NLP pipelines and translation workflows. - English mirror: The
english.spacechina.commirror exists but has minimal content.title_enis alwaysnullin this release. - Subsidiary detection: The
mentioned_subsidiariesfield uses pattern-matching on the body text for the eight CASC academies and their common abbreviations. It is heuristic and may miss references using full official names. - Historical depth: The site retains articles back to at least 2013 across all subchannels, representing the full accessible archive of CASC's public news.