ModelScope Model Catalog Scraper
AIDEVELOPER TOOLS
ModelScope Model Catalog Scraper
Scrape the ModelScope (modelscope.cn) AI model catalog — China's Alibaba-backed model registry hosting ~200k models. Export model IDs, tasks, frameworks, download statistics, star counts, licenses, READMEs, and full metadata for all models in the catalog.
What it does
Sweeps the ModelScope JSON API task-by-task (text-generation, image-generation, multimodal, and 26 other task categories), deduplicates across task overlaps, and optionally enriches each model record with the full README from the per-model detail endpoint.
Output fields per model:
model_id— full identifier (namespace/name)namespace,name— publisher slug and model namechinese_name— display name in Chinese if presenttask— primary task tag used for discoverytasks_all— all task tags, pipe-separatedframeworks— ML frameworks (pytorch, tensorflow, mindspore, etc.), pipe-separatedlanguages— supported languages (en, zh, multilingual, etc.), pipe-separatedlicense— SPDX identifier (apache-2.0, mit, etc.)downloads_30d— downloads in the last 30 daysstars— star countlast_updated,created_at— ISO-8601 timestampsreadme_text— README content, truncated to 8 KB (requiresincludeDetails: true)model_size_params— parameter count label when tagged (7B, 72B, MoE-22B-A2B)quantization_variants— available quantization types from tensor metadata, pipe-separatedbase_model— base model ID if this is a fine-tunepublisher_org,publisher_url— organization name and profile URLhas_demo,has_inference_api— boolean flags
Input
| Field | Type | Default | Description |
|---|---|---|---|
tasks |
array | (all tasks) | Limit to specific task slugs (e.g. text-generation, image-generation). Leave empty to sweep all 29 canonical tasks. |
maxItems |
integer | 100 | Maximum number of models to return. Set to 0 for unlimited (full catalog run). |
includeDetails |
boolean | true | Fetch the per-model detail endpoint for full README text and quantization variant metadata. Disabling this speeds up runs but leaves readme_text and quantization_variants empty. |
Example use cases
- West+East parity datasets — pair with the HuggingFace Model Scraper to build a combined index of both Western and Chinese open-weights releases (Qwen, DeepSeek, Yi, GLM, InternLM, ERNIE, MiniMax, etc.).
- Model landscape research — filter by task, framework, or license to survey which Chinese labs are publishing in specific domains.
- Download trend tracking — schedule regular runs and track
downloads_30dgrowth for specific namespaces or model families. - README content analysis — extract model cards from
readme_textfor NLP-based capability assessment or feature extraction.
Notes
- The API requires no authentication. No proxy is needed — direct access from Apify infrastructure works without restriction.
- Full catalog sweeps (all tasks,
includeDetails: true) are long-running. UsemaxItemsto cap output for targeted queries. - Array output fields (
tasks_all,frameworks,languages,quantization_variants) use|as separator for flat dataset compatibility. Split on|in downstream processing.