Shein New Arrivals & Best-Sellers Trend Scraper
Shein New Arrivals & Best-Sellers Trend Scraper
Scrape Shein category pages for new arrivals and best-sellers. Tracks rank, first_seen_date, rank_movement, pricing, and flash sale data per run. Designed for fashion trend forecasting and dropship research.
Powered by Bright Data Web Unlocker — the only proven path through Shein's Halo anti-bot system.
What you get
Each record represents one product's position in a Shein category listing at the time of the run:
| Field | Description |
|---|---|
goods_id |
Shein's internal numeric product ID |
product_name |
Display title from the listing page |
product_url |
Full canonical product detail URL |
image_url |
Primary listing thumbnail URL |
category / category_id |
Human-readable name + numeric -c-<id> extracted from the URL |
rank |
1-based position in the listing |
first_seen_date |
ISO 8601 date this product first appeared in a run for this category + feed type |
rank_movement |
Position delta vs the previous run (positive = climbed, negative = fell, 0 = first seen or unchanged) |
sale_price |
Current sale price (numeric) |
retail_price |
Original retail price before discount |
discount_pct |
Discount percentage (0–100) |
currency |
ISO 4217 currency code (USD, GBP, EUR, …) |
is_flash_sale |
true if currently on a limited-time flash sale |
flash_sale_price |
Flash sale price if active — null otherwise |
flash_sale_end |
ISO 8601 end timestamp of the flash sale — null if not active |
region |
Storefront region derived from the category URL (us, uk, de, …) |
feed_type |
Which feed this product came from: new_arrival or best_seller |
How it works
Shein's Halo anti-bot system blocks Playwright and standard residential proxies. This actor uses Bright Data Web Unlocker, which handles TLS fingerprinting, browser challenge solving, and rotating exit nodes server-side — returning the full rendered HTML (~4–5 MB per category page) to the actor.
Product data is extracted from an anonymous JavaScript array embedded in the page. The actor applies a mandatory content gate: any response smaller than 400 KB, containing explicit challenge markers, or lacking goods_id signals is treated as a Halo challenge shell and retried (up to 3 times).
Trend state (first_seen_date, previous rank) is persisted to the run's Key-Value store so that consecutive runs compute accurate rank_movement deltas.
Inputs
Category URLs
Shein category page URLs to scrape. Must use the -c-<id>.html URL format:
https://us.shein.com/Women-Dresses-c-1727.html
https://us.shein.com/Women-Tops-c-1738.html
Leave this field empty to use the built-in default set: Women Dresses, Tops, Pants, and Sets on us.shein.com.
Do not add ?sort= manually. The sort direction is controlled by the Feed Type input.
Feed Type
Which trend feed to scrape:
- Both New Arrivals and Best Sellers (default) — fetches each category URL twice, once with
sort=7(newest) and once withsort=8(best-selling). - New Arrivals only — sorted by newest additions (
sort=7). - Best Sellers only — sorted by sales volume / popularity (
sort=8).
Region
Shein storefront region for currency context. Automatically derived from the category URL domain if left unset.
| Value | Storefront |
|---|---|
us |
us.shein.com (USD) |
uk |
shein.co.uk (GBP) |
de |
shein.de (EUR) |
fr |
shein.fr (EUR) |
au |
shein.com.au (AUD) |
ca |
shein.ca (CAD) |
Max Items
Maximum number of product records to emit across all pages and feeds combined. Set to 0 for no limit. Default: 100.
Example output
{
"goods_id": "414605698",
"product_name": "Fashionable Digital Printed Milk Silk Fabric, Cool Summer Maxi Dress",
"product_url": "https://us.shein.com/p-414605698.html",
"image_url": "https://img.ltwebstatic.com/images3_spmp/...",
"category": "Women Dresses",
"category_id": "1727",
"rank": 1,
"first_seen_date": "2026-07-05",
"rank_movement": 0,
"sale_price": 8.93,
"retail_price": 14.99,
"discount_pct": 40,
"currency": "USD",
"is_flash_sale": false,
"flash_sale_price": null,
"flash_sale_end": null,
"region": "us",
"feed_type": "new_arrival"
}
Performance and cost
Each Shein category page is fetched via Bright Data Web Unlocker. The billing model is per successful Web Unlocker request (~$1.50 / 1 000 requests). A standard run with 4 categories on both feeds = 8 BD requests.
Memory: 512 MB. Concurrent BD fetches are capped at 2 to prevent OOM on large response bodies (~5 MB each).
Trend tracking across runs
Run this actor on a schedule (daily or hourly) for the same category URLs. On each run it:
- Reads the previous run's rank state from the Key-Value store.
- Computes
rank_movementper product. - Records
first_seen_datethe first time agoods_idappears. - Persists updated state for the next run.
The state key is category_id:feed_type:goods_id, so multiple categories and feed types are tracked independently.
Anti-bot notes
- Halo challenge shell: Shein serves a ~355 KB JavaScript challenge page that Bright Data's server-side grader sometimes marks as
ok. The actor's content gate (size >= 400 KB +goods_idpresent) rejects these false positives and retries. - Retry limit: Up to 3 BD retries per request. A page that fails all retries is skipped with a warning — the run continues.
- Concurrency:
inn_max_conc: 2limits peak concurrent BD fetches. Do not raise this above 4 without increasing memory.