OrbTop

Chrome Web Store Scraper

DEVELOPER TOOLS

Chrome Web Store Scraper

Scrape Chrome extensions from the Chrome Web Store. Pull comprehensive extension metadata — name, rating, review count, user count, version, full manifest, permissions, category, developer info, screenshots, and website URL. Search by keyword or provide specific extension IDs.

Features

  • Three input modes: by extension ID, by URL, or by search query
  • Rich data extraction: 25 fields per extension including the full parsed manifest.json
  • No proxy needed: Chrome Web Store serves to datacenter IPs cleanly
  • Fast extraction: Data is embedded server-side in the HTML — no JavaScript rendering required
  • Permissions analysis: Extracts permissions array from the manifest for security audits

Use Cases

  • Extension research and competitive analysis
  • Security auditing — identify extensions with broad permissions (<all_urls>, webRequest, etc.)
  • Developer directory building
  • Chrome extension market research and trend tracking
  • Finding extensions by category or keyword

Input Configuration

Field Type Description
extensionIds Array List of extension IDs (32-char alphanumeric) to scrape directly
startUrls Array Chrome Web Store URLs (detail pages, search pages, or category pages)
searchQuery String Search term to find extensions (e.g. "password manager", "ad blocker")
maxItems Integer Maximum number of records to return (0 = unlimited, default 20)

Provide one of extensionIds, startUrls, or searchQuery. If none are provided, the actor runs a default search for "productivity".

Example: Scrape by Extension ID

{
    "extensionIds": ["cjpalhdlnbpafiamejdnhcphjbkeiagm"],
    "maxItems": 1
}

Example: Search by Keyword

{
    "searchQuery": "password manager",
    "maxItems": 20
}

Example: Specific Store URLs

{
    "startUrls": [
        "https://chromewebstore.google.com/detail/ublock-origin/cjpalhdlnbpafiamejdnhcphjbkeiagm",
        "https://chromewebstore.google.com/search/vpn"
    ],
    "maxItems": 50
}

Output Fields

Each result record contains:

Field Type Description
extension_id String Unique 32-char extension ID
url String Chrome Web Store detail page URL
name String Extension display name
short_description String Short description shown in search results
long_description String Full description from the detail page
rating Number Average user rating (0–5)
review_count Integer Total number of user reviews
user_count Integer Approximate number of active users
version String Current published version
size String Extension file size (e.g. "4.27MiB")
category String Primary category path (e.g. "productivity/workflow")
website_url String Developer's website URL
icon_url String Extension icon URL
header_image_url String Header/marquee image URL
promo_image_url String Promotional tile image URL
screenshots Array Screenshot image URLs
developer_email String Developer contact email
developer_name String Developer display name
developer_id String Developer identifier
manifest Object Full parsed manifest.json
permissions Array Extension permissions list
languages Array Supported language names
published_at String Original publish date (ISO 8601)
updated_at String Last update date (ISO 8601)
scraped_at String Scrape timestamp (ISO 8601)

Sample Output

{
    "extension_id": "cjpalhdlnbpafiamejdnhcphjbkeiagm",
    "url": "https://chromewebstore.google.com/detail/ublock-origin/cjpalhdlnbpafiamejdnhcphjbkeiagm",
    "name": "uBlock Origin",
    "short_description": "Finally, an efficient blocker. Easy on CPU and memory.",
    "rating": 4.6973,
    "review_count": 35453,
    "user_count": 14000000,
    "version": "1.71.0",
    "size": "4.27MiB",
    "category": "make_chrome_yours/privacy",
    "developer_email": "ubo@raymondhill.net",
    "developer_name": "Raymond Hill (gorhill)",
    "permissions": ["alarms", "contextMenus", "privacy", "storage", "tabs", "webRequest", "webRequestBlocking", "<all_urls>"],
    "published_at": "2014-06-24T00:52:35.000Z",
    "updated_at": "2026-05-12T05:16:59.000Z",
    "scraped_at": "2026-06-12T03:00:10.000Z"
}

Technical Notes

  • Data is extracted from server-rendered AF_initDataCallback script blocks — no browser rendering needed
  • core_crawler (CheerioCrawler) with concurrency 3 to respect Google rate limits
  • No proxy required — datacenter IPs work fine
  • Memory: 512 MB is sufficient for most runs
  • Timeout: 4 hours default (plenty for bulk runs)