OrbTop

Website Image Scraper

DEVELOPER TOOLSAUTOMATION

Website Image Scraper

Extract every image URL from any website. Give it a URL, get back a list of images with alt text, dimensions, srcset candidates, and CSS background-image URLs. Works on both static and JavaScript-rendered pages.

What it does

Points a Playwright browser at your URL, lets the page fully render, then pulls every image it can find — <img> tags, <picture>/<source> elements, lazy-load data-src attributes, and background-image CSS rules. Returns one record per image with the metadata that's actually useful.

Optionally follows internal links up to a configurable depth, so you can audit images across an entire section of a site rather than just one page.

Output

Each record contains:

Field Description
image_url Absolute URL of the image
page_url URL of the page where the image was found
alt_text Alt text (empty string if none)
width Width attribute value (empty string if not set in HTML)
height Height attribute value (empty string if not set in HTML)
srcset Raw srcset attribute value
srcset_urls Comma-separated absolute URLs parsed from srcset
loading Loading attribute (lazy, eager, or empty string)
source_tag Source element: img, source, or css-background
scraped_at ISO 8601 timestamp

Input

Parameter Type Default Description
url string Website URL to extract images from
maxItems integer 200 Maximum image records to return. Set to 0 for unlimited
crawlLinks boolean false Follow internal links to scrape images from multiple pages
maxDepth integer 1 Max depth for internal link crawling (1–3)

Usage notes

On JavaScript-rendered sites: Images loaded via lazy-loading, React, Vue, or Angular hydration are fully captured. The actor waits for the page network to idle before extracting, which catches most dynamic content.

On srcset: Multi-resolution images are captured as both the raw srcset string and a parsed comma-separated list of absolute URLs. The image_url field holds the primary src value.

On CSS backgrounds: Computed background-image styles are walked across all visible elements. Inline data URIs are filtered out — the output contains only linkable image URLs.

On depth: With crawlLinks: true and maxDepth: 1, the actor crawls the start page plus any internal pages linked from it. Each page is only visited once. Fan-out is capped at 30 new links per page to avoid runaway crawls.

Example output

{
  "image_url": "https://books.toscrape.com/media/cache/fe/72/fe72f0532301ec28892ae79a629a293c.jpg",
  "page_url": "https://books.toscrape.com",
  "alt_text": "A Light in the Attic",
  "width": "",
  "height": "",
  "srcset": "",
  "srcset_urls": "",
  "loading": "",
  "source_tag": "img",
  "scraped_at": "2026-06-10T12:00:00.000Z"
}

Questions or issues?

Use the feedback fields in the input form or reach out at actor-support@orbtop.com.