OrbTop

CVMA China Veterinary Association News Scraper

NEWSBUSINESS

CVMA China Veterinary Association News Scraper

Scrapes news articles, regulatory notices, and policy updates from the China Veterinary Medical Association (CVMA / 中国兽医协会) at cvma.org.cn.

What it does

This actor crawls CVMA's news column listing pages and extracts full article content including title, body text, publication date, author, and any PDF attachment URLs. It covers multiple news categories:

  • 协会新闻 (Association News, column 6847) — ~1,250 articles, 125 pages
  • 通知公告 (Notices & Announcements, column 6848) — ~780 articles, 78 pages
  • 行业动态 (Industry Dynamics, column 6849) — ~1,940 articles, 194 pages
  • 政策法规 (Policies & Regulations, column 6851) — ~200 articles, 20 pages

Additional columns can be specified via the columns input.

Use cases

  • Track Chinese veterinary regulatory updates (drug licensing, certification announcements, disease alerts)
  • Monitor ASF / avian-flu / pet-disease policy notices from China's national vet body
  • Feed a regulatory intelligence pipeline covering the China animal-health market
  • Companion data source to AAVMC, AAEP, ABVP, and WSAVA scrapers for international vet-pharma analysis

Input

Field Type Description Default
maxItems integer Maximum articles to scrape 10
columns string[] Column IDs to scrape ["6847","6848","6849","6851"]

Example input

{
  "maxItems": 100,
  "columns": ["6847", "6848", "6851"]
}

Output

Each record in the dataset contains:

Field Type Description
articleId string Numeric article ID from URL path
articleUrl string Full article URL
columnId string CVMA column ID
columnName string Column name in Chinese
title string Article title
publishedAt string Publication date (YYYY-MM-DD)
author string Publisher attribution
bodyText string Plain text article body
bodyHtml string HTML article body
attachments string Pipe-separated PDF attachment URLs
tags string Keywords if present
firstSeen string ISO-8601 timestamp of discovery
scrapedAt string ISO-8601 timestamp of scrape

Example record

{
  "articleId": "73649",
  "articleUrl": "https://www.cvma.org.cn/6847/202605/73649.html",
  "columnId": "6847",
  "columnName": "协会新闻",
  "title": "\"中国兽医协会宠物营养专家库\"首次线下交流会圆满召开",
  "publishedAt": "2026-05-21",
  "author": "中国兽医协会",
  "bodyText": "5月19日,由中国兽医协会主办...",
  "bodyHtml": "<section>...</section>",
  "attachments": null,
  "tags": null,
  "firstSeen": "2026-05-31T05:00:00.000Z",
  "scrapedAt": "2026-05-31T05:00:01.234Z"
}

Notes

  • The site is a static PHP CMS with no anti-bot protection or JavaScript requirement.
  • Some articles in the 协会新闻 column link to external WeChat posts — these are skipped; only local site articles are scraped.
  • Regulatory notices in the 通知公告 column often include PDF attachments (team standards, drug certifications, etc.) — their URLs are captured in the attachments field.
  • The actor uses polite crawling with a concurrency of 5 and rate-limit handling.