OrbTop

CSRC China Securities Regulator Disclosure Scraper

BUSINESSNEWS

CSRC China Securities Regulator Disclosure Scraper

Scrapes regulatory disclosures from the China Securities Regulatory Commission (CSRC — 中国证监会) at www.csrc.gov.cn. The CSRC is the principal federal securities regulator for Chinese capital markets — analogous to the SEC in the United States. This actor covers the CSRC news and announcement sections, which include enforcement notices, policy interpretations, press conferences, and administrative actions.

What this actor does

Crawls CSRC static-HTML listing pages and extracts individual disclosure records from detail pages. Each record includes:

  • Disclosure ID, title, and canonical URL
  • Category (证监会要闻, 新闻发布会, 政策解读, and others via startUrls)
  • Publishing date and issuing office
  • Enforcement metadata: penalty type, penalty amount (CNY), case number
  • Violation summary (first substantive paragraph)
  • PDF attachment URL (when present)
  • Source listing URL

Use cases

  • Compliance and AML/KYC screening — Monitor enforcement actions against firms and individuals in Chinese securities markets
  • Sanctions and regulatory intelligence — Track market bans, fines, and license revocations
  • EM equity research — Follow regulatory trends affecting listed companies, brokers, and fund managers
  • Journalism — Monitor CSRC enforcement trends (financial fraud, market manipulation, insider trading)
  • Academic research — Build longitudinal datasets of Chinese securities enforcement

Input

Parameter Type Default Description
startUrls array CSRC news + press + policy pages Override the default listing URLs. Use any common_list.shtml URL from csrc.gov.cn
maxItems integer required Maximum number of records to scrape. Set to 0 for no limit

Default categories crawled

Section URL
证监会要闻 (CSRC News) http://www.csrc.gov.cn/csrc/c100028/common_list.shtml
新闻发布会 (Press Conferences) http://www.csrc.gov.cn/csrc/c100029/common_list.shtml
政策解读 (Policy Interpretation) http://www.csrc.gov.cn/csrc/c100039/common_list.shtml

Custom categories via startUrls

Supply any CSRC listing URL to target specific sections. Pagination is handled automatically.

{
  "startUrls": [
    { "url": "http://www.csrc.gov.cn/csrc/c100028/common_list.shtml" }
  ],
  "maxItems": 100
}

Output schema

Each item in the dataset has the following fields:

Field Type Description
disclosure_id string CSRC internal ID (e.g. c1615676)
url string Detail page URL
title string Disclosure title (Chinese)
category string Section label (证监会要闻, etc.)
issuing_office string 证监会 or provincial bureau
publish_date string ISO-8601 publication date
effective_date string Effective date (where present)
subject_entity string Named subject (company/individual)
subject_role string Role of subject
penalty_type string Penalty types (pipe-delimited)
penalty_amount_cny number Fine amount in CNY
violation_summary string First paragraph of disclosure text
pdf_url string Attached PDF URL
case_number string Case reference (〔YYYY〕XX号)
source_url string Listing page URL
scrapedAt string ISO-8601 scrape timestamp

Technical notes

  • No proxy required — CSRC is a Chinese government portal accessible directly without proxy
  • Static HTML — All listing pages use static pagination (common_list_N.shtml), no JavaScript rendering needed
  • Pagination — Automatically detected from createPageHTML() calls; up to 200 pages per category
  • Enforcement detail pages — Full content is extracted from content.shtml detail pages; metadata (penalty amounts, case numbers) is parsed from body text using regex patterns

Related actors