ModelScope Model Catalog Scraper

Scrape the ModelScope (modelscope.cn) AI model catalog — China's Alibaba-backed model registry hosting ~200k models. Export model IDs, tasks, frameworks, download statistics, star counts, licenses, READMEs, and full metadata for all models in the catalog.

What it does

Sweeps the ModelScope JSON API task-by-task (text-generation, image-generation, multimodal, and 26 other task categories), deduplicates across task overlaps, and optionally enriches each model record with the full README from the per-model detail endpoint.

Output fields per model:

model_id — full identifier (namespace/name)
namespace, name — publisher slug and model name
chinese_name — display name in Chinese if present
task — primary task tag used for discovery
tasks_all — all task tags, pipe-separated
frameworks — ML frameworks (pytorch, tensorflow, mindspore, etc.), pipe-separated
languages — supported languages (en, zh, multilingual, etc.), pipe-separated
license — SPDX identifier (apache-2.0, mit, etc.)
downloads_30d — downloads in the last 30 days
stars — star count
last_updated, created_at — ISO-8601 timestamps
readme_text — README content, truncated to 8 KB (requires includeDetails: true)
model_size_params — parameter count label when tagged (7B, 72B, MoE-22B-A2B)
quantization_variants — available quantization types from tensor metadata, pipe-separated
base_model — base model ID if this is a fine-tune
publisher_org, publisher_url — organization name and profile URL
has_demo, has_inference_api — boolean flags

Input

Field	Type	Default	Description
`tasks`	array	(all tasks)	Limit to specific task slugs (e.g. `text-generation`, `image-generation`). Leave empty to sweep all 29 canonical tasks.
`maxItems`	integer	100	Maximum number of models to return. Set to `0` for unlimited (full catalog run).
`includeDetails`	boolean	true	Fetch the per-model detail endpoint for full README text and quantization variant metadata. Disabling this speeds up runs but leaves `readme_text` and `quantization_variants` empty.

Example use cases

West+East parity datasets — pair with the HuggingFace Model Scraper to build a combined index of both Western and Chinese open-weights releases (Qwen, DeepSeek, Yi, GLM, InternLM, ERNIE, MiniMax, etc.).
Model landscape research — filter by task, framework, or license to survey which Chinese labs are publishing in specific domains.
Download trend tracking — schedule regular runs and track downloads_30d growth for specific namespaces or model families.
README content analysis — extract model cards from readme_text for NLP-based capability assessment or feature extraction.

Notes

The API requires no authentication. No proxy is needed — direct access from Apify infrastructure works without restriction.
Full catalog sweeps (all tasks, includeDetails: true) are long-running. Use maxItems to cap output for targeted queries.
Array output fields (tasks_all, frameworks, languages, quantization_variants) use | as separator for flat dataset compatibility. Split on | in downstream processing.

ModelScope Model Catalog Scraper

ModelScope Model Catalog Scraper

What it does

Input

Example use cases

Notes

Related AI & Data scrapers