OrbTop

Shein New Arrivals & Best-Sellers Trend Scraper

ECOMMERCEAUTOMATION

Shein New Arrivals & Best-Sellers Trend Scraper

Scrape Shein category pages for new arrivals and best-sellers. Tracks rank, first_seen_date, rank_movement, pricing, and flash sale data per run. Designed for fashion trend forecasting and dropship research.

Powered by Bright Data Web Unlocker — the only proven path through Shein's Halo anti-bot system.

What you get

Each record represents one product's position in a Shein category listing at the time of the run:

Field Description
goods_id Shein's internal numeric product ID
product_name Display title from the listing page
product_url Full canonical product detail URL
image_url Primary listing thumbnail URL
category / category_id Human-readable name + numeric -c-<id> extracted from the URL
rank 1-based position in the listing
first_seen_date ISO 8601 date this product first appeared in a run for this category + feed type
rank_movement Position delta vs the previous run (positive = climbed, negative = fell, 0 = first seen or unchanged)
sale_price Current sale price (numeric)
retail_price Original retail price before discount
discount_pct Discount percentage (0–100)
currency ISO 4217 currency code (USD, GBP, EUR, …)
is_flash_sale true if currently on a limited-time flash sale
flash_sale_price Flash sale price if active — null otherwise
flash_sale_end ISO 8601 end timestamp of the flash sale — null if not active
region Storefront region derived from the category URL (us, uk, de, …)
feed_type Which feed this product came from: new_arrival or best_seller

How it works

Shein's Halo anti-bot system blocks Playwright and standard residential proxies. This actor uses Bright Data Web Unlocker, which handles TLS fingerprinting, browser challenge solving, and rotating exit nodes server-side — returning the full rendered HTML (~4–5 MB per category page) to the actor.

Product data is extracted from an anonymous JavaScript array embedded in the page. The actor applies a mandatory content gate: any response smaller than 400 KB, containing explicit challenge markers, or lacking goods_id signals is treated as a Halo challenge shell and retried (up to 3 times).

Trend state (first_seen_date, previous rank) is persisted to the run's Key-Value store so that consecutive runs compute accurate rank_movement deltas.

Inputs

Category URLs

Shein category page URLs to scrape. Must use the -c-<id>.html URL format:

https://us.shein.com/Women-Dresses-c-1727.html
https://us.shein.com/Women-Tops-c-1738.html

Leave this field empty to use the built-in default set: Women Dresses, Tops, Pants, and Sets on us.shein.com.

Do not add ?sort= manually. The sort direction is controlled by the Feed Type input.

Feed Type

Which trend feed to scrape:

  • Both New Arrivals and Best Sellers (default) — fetches each category URL twice, once with sort=7 (newest) and once with sort=8 (best-selling).
  • New Arrivals only — sorted by newest additions (sort=7).
  • Best Sellers only — sorted by sales volume / popularity (sort=8).

Region

Shein storefront region for currency context. Automatically derived from the category URL domain if left unset.

Value Storefront
us us.shein.com (USD)
uk shein.co.uk (GBP)
de shein.de (EUR)
fr shein.fr (EUR)
au shein.com.au (AUD)
ca shein.ca (CAD)

Max Items

Maximum number of product records to emit across all pages and feeds combined. Set to 0 for no limit. Default: 100.

Example output

{
  "goods_id": "414605698",
  "product_name": "Fashionable Digital Printed Milk Silk Fabric, Cool Summer Maxi Dress",
  "product_url": "https://us.shein.com/p-414605698.html",
  "image_url": "https://img.ltwebstatic.com/images3_spmp/...",
  "category": "Women Dresses",
  "category_id": "1727",
  "rank": 1,
  "first_seen_date": "2026-07-05",
  "rank_movement": 0,
  "sale_price": 8.93,
  "retail_price": 14.99,
  "discount_pct": 40,
  "currency": "USD",
  "is_flash_sale": false,
  "flash_sale_price": null,
  "flash_sale_end": null,
  "region": "us",
  "feed_type": "new_arrival"
}

Performance and cost

Each Shein category page is fetched via Bright Data Web Unlocker. The billing model is per successful Web Unlocker request (~$1.50 / 1 000 requests). A standard run with 4 categories on both feeds = 8 BD requests.

Memory: 512 MB. Concurrent BD fetches are capped at 2 to prevent OOM on large response bodies (~5 MB each).

Trend tracking across runs

Run this actor on a schedule (daily or hourly) for the same category URLs. On each run it:

  1. Reads the previous run's rank state from the Key-Value store.
  2. Computes rank_movement per product.
  3. Records first_seen_date the first time a goods_id appears.
  4. Persists updated state for the next run.

The state key is category_id:feed_type:goods_id, so multiple categories and feed types are tracked independently.

Anti-bot notes

  • Halo challenge shell: Shein serves a ~355 KB JavaScript challenge page that Bright Data's server-side grader sometimes marks as ok. The actor's content gate (size >= 400 KB + goods_id present) rejects these false positives and retries.
  • Retry limit: Up to 3 BD retries per request. A page that fails all retries is skipped with a warning — the run continues.
  • Concurrency: inn_max_conc: 2 limits peak concurrent BD fetches. Do not raise this above 4 without increasing memory.