OrbTop

CASC SpaceChina Corporate News Scraper

NEWSBUSINESS

CASC SpaceChina Corporate News Scraper

Scrapes press releases and corporate news from SpaceChina.com — the public news portal of CASC (China Aerospace Science and Technology Corporation / 中国航天科技集团有限公司). Extracts structured articles from three news subchannels with full body text, publication date, image URLs, PDF attachments, and automatic CASC subsidiary detection.

What it collects

Field Description
article_id Unique numeric ID from the article URL (e.g. 4632103)
subchannel Source subchannel: 集团要闻, 媒体聚焦, or 专题报道
title_zh Article title in Chinese
title_en English title (null — future enhancement via english.spacechina.com mirror)
body_html Full article body HTML
body_text Full article body as plain text
publish_date Publication date (ISO 8601, e.g. 2026-06-11)
source_url Canonical article URL
mentioned_subsidiaries CASC academy/subsidiary names detected in body (一院 through 八院, CALT, CAST, SAST)
images Absolute URLs of embedded article images
attachments Absolute URLs of PDF attachments (e.g. annual social-responsibility reports)

Subchannels covered

Key Chinese English
jtyw 集团要闻 Group News — primary launch and operations press releases
mjjj 媒体聚焦 Media Focus — external press coverage aggregated
ztbd 专题报道 Special Reports — themed coverage (missions, events, policy)

The actor crawls all pages within each selected subchannel, following the site's paginated listing structure automatically.

Input

Parameter Type Required Description
maxItems integer Yes Maximum number of articles to scrape. Set to a high value (or remove the cap) for a full historical crawl (~3,000+ articles across all subchannels).
subchannels array Yes Which subchannels to include. Accepts any combination of jtyw, mjjj, ztbd. Default: all three.

Example input

{
  "maxItems": 100,
  "subchannels": ["jtyw"]
}

Use cases

  • Defense and aerospace intelligence — Track every CASC press release mentioning specific launch vehicles, academies, or programs.
  • ESG / sanctions screening — Identify CASC subsidiaries (一院 through 八院) named in corporate announcements for mil-civ fusion exposure mapping.
  • Trade compliance — Monitor export-control-relevant announcements (new satellite programs, foreign partnerships, dual-use technology disclosures).
  • Annual reports — The 专题报道 channel carries annual social-responsibility reports back to 2013 as PDF attachments.
  • Research and journalism — Build a full-text searchable archive of CASC's public-facing communications.

Notes

  • Chinese-language content: All articles are in Simplified Chinese. The body_text field is suitable for NLP pipelines and translation workflows.
  • English mirror: The english.spacechina.com mirror exists but has minimal content. title_en is always null in this release.
  • Subsidiary detection: The mentioned_subsidiaries field uses pattern-matching on the body text for the eight CASC academies and their common abbreviations. It is heuristic and may miss references using full official names.
  • Historical depth: The site retains articles back to at least 2013 across all subchannels, representing the full accessible archive of CASC's public news.