Website Image Scraper
Website Image Scraper
Extract every image URL from any website. Give it a URL, get back a list of images with alt text, dimensions, srcset candidates, and CSS background-image URLs. Works on both static and JavaScript-rendered pages.
What it does
Points a Playwright browser at your URL, lets the page fully render, then pulls every image it can find — <img> tags, <picture>/<source> elements, lazy-load data-src attributes, and background-image CSS rules. Returns one record per image with the metadata that's actually useful.
Optionally follows internal links up to a configurable depth, so you can audit images across an entire section of a site rather than just one page.
Output
Each record contains:
| Field | Description |
|---|---|
image_url |
Absolute URL of the image |
page_url |
URL of the page where the image was found |
alt_text |
Alt text (empty string if none) |
width |
Width attribute value (empty string if not set in HTML) |
height |
Height attribute value (empty string if not set in HTML) |
srcset |
Raw srcset attribute value |
srcset_urls |
Comma-separated absolute URLs parsed from srcset |
loading |
Loading attribute (lazy, eager, or empty string) |
source_tag |
Source element: img, source, or css-background |
scraped_at |
ISO 8601 timestamp |
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
string | — | Website URL to extract images from |
maxItems |
integer | 200 | Maximum image records to return. Set to 0 for unlimited |
crawlLinks |
boolean | false | Follow internal links to scrape images from multiple pages |
maxDepth |
integer | 1 | Max depth for internal link crawling (1–3) |
Usage notes
On JavaScript-rendered sites: Images loaded via lazy-loading, React, Vue, or Angular hydration are fully captured. The actor waits for the page network to idle before extracting, which catches most dynamic content.
On srcset: Multi-resolution images are captured as both the raw srcset string and a parsed comma-separated list of absolute URLs. The image_url field holds the primary src value.
On CSS backgrounds: Computed background-image styles are walked across all visible elements. Inline data URIs are filtered out — the output contains only linkable image URLs.
On depth: With crawlLinks: true and maxDepth: 1, the actor crawls the start page plus any internal pages linked from it. Each page is only visited once. Fan-out is capped at 30 new links per page to avoid runaway crawls.
Example output
{
"image_url": "https://books.toscrape.com/media/cache/fe/72/fe72f0532301ec28892ae79a629a293c.jpg",
"page_url": "https://books.toscrape.com",
"alt_text": "A Light in the Attic",
"width": "",
"height": "",
"srcset": "",
"srcset_urls": "",
"loading": "",
"source_tag": "img",
"scraped_at": "2026-06-10T12:00:00.000Z"
}
Questions or issues?
Use the feedback fields in the input form or reach out at actor-support@orbtop.com.