Website Image Scraper

Extract every image URL from any website. Give it a URL, get back a list of images with alt text, dimensions, srcset candidates, and CSS background-image URLs. Works on both static and JavaScript-rendered pages.

What it does

Points a Playwright browser at your URL, lets the page fully render, then pulls every image it can find — <img> tags, <picture>/<source> elements, lazy-load data-src attributes, and background-image CSS rules. Returns one record per image with the metadata that's actually useful.

Optionally follows internal links up to a configurable depth, so you can audit images across an entire section of a site rather than just one page.

Output

Each record contains:

Field	Description
`image_url`	Absolute URL of the image
`page_url`	URL of the page where the image was found
`alt_text`	Alt text (empty string if none)
`width`	Width attribute value (empty string if not set in HTML)
`height`	Height attribute value (empty string if not set in HTML)
`srcset`	Raw `srcset` attribute value
`srcset_urls`	Comma-separated absolute URLs parsed from srcset
`loading`	Loading attribute (`lazy`, `eager`, or empty string)
`source_tag`	Source element: `img`, `source`, or `css-background`
`scraped_at`	ISO 8601 timestamp

Input

Parameter	Type	Default	Description
`url`	string	—	Website URL to extract images from
`maxItems`	integer	200	Maximum image records to return. Set to `0` for unlimited
`crawlLinks`	boolean	false	Follow internal links to scrape images from multiple pages
`maxDepth`	integer	1	Max depth for internal link crawling (1–3)

Usage notes

On JavaScript-rendered sites: Images loaded via lazy-loading, React, Vue, or Angular hydration are fully captured. The actor waits for the page network to idle before extracting, which catches most dynamic content.

On srcset: Multi-resolution images are captured as both the raw srcset string and a parsed comma-separated list of absolute URLs. The image_url field holds the primary src value.

On CSS backgrounds: Computed background-image styles are walked across all visible elements. Inline data URIs are filtered out — the output contains only linkable image URLs.

On depth: With crawlLinks: true and maxDepth: 1, the actor crawls the start page plus any internal pages linked from it. Each page is only visited once. Fan-out is capped at 30 new links per page to avoid runaway crawls.

Example output

{
  "image_url": "https://books.toscrape.com/media/cache/fe/72/fe72f0532301ec28892ae79a629a293c.jpg",
  "page_url": "https://books.toscrape.com",
  "alt_text": "A Light in the Attic",
  "width": "",
  "height": "",
  "srcset": "",
  "srcset_urls": "",
  "loading": "",
  "source_tag": "img",
  "scraped_at": "2026-06-10T12:00:00.000Z"
}

Questions or issues?

Use the feedback fields in the input form or reach out at actor-support@orbtop.com.