OrbTop

Website Contact Extractor

LEAD GENERATION

Website Contact Extractor

Extract emails, phone numbers, physical addresses, and social media links from any website. Supply a list of URLs and get one structured record per domain — ready for lead generation, outreach, or contact research.

What it does

For each website in your input list, the actor:

  1. Fetches the homepage
  2. Probes common contact sub-pages (/contact, /contact-us, /about, /about-us, /impressum, /legal)
  3. Extracts and deduplicates all contact data across those pages
  4. Returns one row per domain

Output fields

Field Description
url Homepage URL (scheme + host)
domain Domain name without www (e.g. example.com)
emails Comma-separated email addresses found on the site
phones Comma-separated phone numbers found on the site
social_links Comma-separated social media profile URLs
address Physical address if found (JSON-LD schema.org or heuristic footer detection)
pages_crawled Number of pages successfully crawled for this domain
scraped_at ISO-8601 timestamp

Input

Field Type Description
startUrls Array List of website URLs to extract contact info from
maxItems Integer Maximum number of domain records to return (0 = no limit)

Example input

{
    "startUrls": [
        { "url": "https://example.com" },
        { "url": "https://another-company.com" }
    ],
    "maxItems": 10
}

Example output

{
    "url": "https://example.com",
    "domain": "example.com",
    "emails": "hello@example.com, support@example.com",
    "phones": "+1 800 555 0100",
    "social_links": "https://linkedin.com/company/example, https://twitter.com/example",
    "address": "123 Main St, San Francisco, CA 94105, US",
    "pages_crawled": 7,
    "scraped_at": "2026-06-04T10:00:00.000Z"
}

Notes

  • Email extraction prioritises <a href="mailto:..."> links (clean, unambiguous) before falling back to a regex sweep of page text.
  • Phone extraction requires at least 7 digits and rejects date-like strings, version numbers, and decimal sequences to minimise false positives.
  • Social links covers LinkedIn, X/Twitter, Facebook, Instagram, YouTube, TikTok, GitHub, Pinterest, WhatsApp, Telegram, Medium, Reddit, and Snapchat.
  • Physical address is read from JSON-LD schema.org/PostalAddress markup first, then from common CSS selectors (footer address, [itemprop="address"], .address).
  • Sub-pages that return 404 or other errors are silently skipped — only successfully loaded pages contribute data.
  • Duplicate domains in startUrls are collapsed to one record.

Pricing

Pay per website processed. Charged at run start plus a per-record fee when results are returned.