Chrome Web Store Scraper

Scrape Chrome extensions from the Chrome Web Store. Pull comprehensive extension metadata — name, rating, review count, user count, version, full manifest, permissions, category, developer info, screenshots, and website URL. Search by keyword or provide specific extension IDs.

Features

Three input modes: by extension ID, by URL, or by search query
Rich data extraction: 25 fields per extension including the full parsed manifest.json
No proxy needed: Chrome Web Store serves to datacenter IPs cleanly
Fast extraction: Data is embedded server-side in the HTML — no JavaScript rendering required
Permissions analysis: Extracts permissions array from the manifest for security audits

Use Cases

Extension research and competitive analysis
Security auditing — identify extensions with broad permissions (<all_urls>, webRequest, etc.)
Developer directory building
Chrome extension market research and trend tracking
Finding extensions by category or keyword

Input Configuration

Field	Type	Description
`extensionIds`	Array	List of extension IDs (32-char alphanumeric) to scrape directly
`startUrls`	Array	Chrome Web Store URLs (detail pages, search pages, or category pages)
`searchQuery`	String	Search term to find extensions (e.g. `"password manager"`, `"ad blocker"`)
`maxItems`	Integer	Maximum number of records to return (0 = unlimited, default 20)

Provide one of extensionIds, startUrls, or searchQuery. If none are provided, the actor runs a default search for "productivity".

Example: Scrape by Extension ID

{
    "extensionIds": ["cjpalhdlnbpafiamejdnhcphjbkeiagm"],
    "maxItems": 1
}

Example: Search by Keyword

{
    "searchQuery": "password manager",
    "maxItems": 20
}

Example: Specific Store URLs

{
    "startUrls": [
        "https://chromewebstore.google.com/detail/ublock-origin/cjpalhdlnbpafiamejdnhcphjbkeiagm",
        "https://chromewebstore.google.com/search/vpn"
    ],
    "maxItems": 50
}

Output Fields

Each result record contains:

Field	Type	Description
`extension_id`	String	Unique 32-char extension ID
`url`	String	Chrome Web Store detail page URL
`name`	String	Extension display name
`short_description`	String	Short description shown in search results
`long_description`	String	Full description from the detail page
`rating`	Number	Average user rating (0–5)
`review_count`	Integer	Total number of user reviews
`user_count`	Integer	Approximate number of active users
`version`	String	Current published version
`size`	String	Extension file size (e.g. `"4.27MiB"`)
`category`	String	Primary category path (e.g. `"productivity/workflow"`)
`website_url`	String	Developer's website URL
`icon_url`	String	Extension icon URL
`header_image_url`	String	Header/marquee image URL
`promo_image_url`	String	Promotional tile image URL
`screenshots`	Array	Screenshot image URLs
`developer_email`	String	Developer contact email
`developer_name`	String	Developer display name
`developer_id`	String	Developer identifier
`manifest`	Object	Full parsed `manifest.json`
`permissions`	Array	Extension permissions list
`languages`	Array	Supported language names
`published_at`	String	Original publish date (ISO 8601)
`updated_at`	String	Last update date (ISO 8601)
`scraped_at`	String	Scrape timestamp (ISO 8601)

Sample Output

{
    "extension_id": "cjpalhdlnbpafiamejdnhcphjbkeiagm",
    "url": "https://chromewebstore.google.com/detail/ublock-origin/cjpalhdlnbpafiamejdnhcphjbkeiagm",
    "name": "uBlock Origin",
    "short_description": "Finally, an efficient blocker. Easy on CPU and memory.",
    "rating": 4.6973,
    "review_count": 35453,
    "user_count": 14000000,
    "version": "1.71.0",
    "size": "4.27MiB",
    "category": "make_chrome_yours/privacy",
    "developer_email": "ubo@raymondhill.net",
    "developer_name": "Raymond Hill (gorhill)",
    "permissions": ["alarms", "contextMenus", "privacy", "storage", "tabs", "webRequest", "webRequestBlocking", "<all_urls>"],
    "published_at": "2014-06-24T00:52:35.000Z",
    "updated_at": "2026-05-12T05:16:59.000Z",
    "scraped_at": "2026-06-12T03:00:10.000Z"
}

Technical Notes

Data is extracted from server-rendered AF_initDataCallback script blocks — no browser rendering needed
core_crawler (CheerioCrawler) with concurrency 3 to respect Google rate limits
No proxy required — datacenter IPs work fine
Memory: 512 MB is sufficient for most runs
Timeout: 4 hours default (plenty for bulk runs)