OrbTop

Beijing Dance Academy (BDA) News & Announcements Scraper

EDUCATIONNEWS

Beijing Dance Academy (BDA) News & Announcements Scraper

Scrapes news articles, institutional announcements, and content from Beijing Dance Academy (北京舞蹈学院, bda.edu.cn) — China's apex classical-dance institution and the only national-level university dedicated exclusively to dance education.

What it does

The actor crawls BDA's WebPlus CMS across multiple content sections, extracting article metadata and optionally the full article body. Sections include:

Section Chinese name Content
xxyw 学校要闻 Campus news (~3,160 articles)
tzgg 信息公告 Notices & announcements (~240 articles)
jcdt 基层动态 Departmental updates (~1,600 articles)
xzdt 行政动态 Administrative news
sydt 艺术实践 Arts practice & exchange
mtwy 媒体舞院 Media coverage

Output

Each record contains:

Field Description
id MD5 hash of the article URL (stable dedup key)
page_url Full canonical URL
title Article title (Chinese)
category Section slug (xxyw, tzgg, etc.)
publish_date Publication date (YYYY-MM-DD)
body_html Full HTML body (when scrapeBody: true)
body_text Plain-text body (when scrapeBody: true)
attachments Array of PDF/document URLs found in the article
source_url Listing page URL that linked to this article
scrapedAt ISO-8601 scrape timestamp

Input options

Field Type Default Description
maxItems integer 10 Maximum total articles to return
sections array ["xxyw","tzgg"] BDA content sections to crawl
scrapeBody boolean true Fetch full article body (slower but richer)

Use cases

  • AI training corpus: Chinese-language dance terminology, pedagogy, and institutional content
  • Competitive intelligence: Track BDA admissions, recruitment, and program announcements
  • Research: Academic study of Chinese dance education and institutional communications
  • Media monitoring: Track BDA in Chinese media and faculty/program news

Notes

  • Some articles in the notices section (tzgg) link to external Chinese government procurement sites — these are saved with listing metadata only (no body), as they are off-domain.
  • WeChat article links (mp.weixin.qq.com) are also captured as listing-only records.
  • The site is served from China but is fully accessible from US datacenter IPs — no proxy required.
  • Conservative concurrency (3 concurrent requests) respects the CN-hosted university server.