Website Contact Extractor
LEAD GENERATION
Website Contact Extractor
Extract emails, phone numbers, physical addresses, and social media links from any website. Supply a list of URLs and get one structured record per domain — ready for lead generation, outreach, or contact research.
What it does
For each website in your input list, the actor:
- Fetches the homepage
- Probes common contact sub-pages (
/contact,/contact-us,/about,/about-us,/impressum,/legal) - Extracts and deduplicates all contact data across those pages
- Returns one row per domain
Output fields
| Field | Description |
|---|---|
url |
Homepage URL (scheme + host) |
domain |
Domain name without www (e.g. example.com) |
emails |
Comma-separated email addresses found on the site |
phones |
Comma-separated phone numbers found on the site |
social_links |
Comma-separated social media profile URLs |
address |
Physical address if found (JSON-LD schema.org or heuristic footer detection) |
pages_crawled |
Number of pages successfully crawled for this domain |
scraped_at |
ISO-8601 timestamp |
Input
| Field | Type | Description |
|---|---|---|
startUrls |
Array | List of website URLs to extract contact info from |
maxItems |
Integer | Maximum number of domain records to return (0 = no limit) |
Example input
{
"startUrls": [
{ "url": "https://example.com" },
{ "url": "https://another-company.com" }
],
"maxItems": 10
}
Example output
{
"url": "https://example.com",
"domain": "example.com",
"emails": "hello@example.com, support@example.com",
"phones": "+1 800 555 0100",
"social_links": "https://linkedin.com/company/example, https://twitter.com/example",
"address": "123 Main St, San Francisco, CA 94105, US",
"pages_crawled": 7,
"scraped_at": "2026-06-04T10:00:00.000Z"
}
Notes
- Email extraction prioritises
<a href="mailto:...">links (clean, unambiguous) before falling back to a regex sweep of page text. - Phone extraction requires at least 7 digits and rejects date-like strings, version numbers, and decimal sequences to minimise false positives.
- Social links covers LinkedIn, X/Twitter, Facebook, Instagram, YouTube, TikTok, GitHub, Pinterest, WhatsApp, Telegram, Medium, Reddit, and Snapchat.
- Physical address is read from JSON-LD
schema.org/PostalAddressmarkup first, then from common CSS selectors (footer address,[itemprop="address"],.address). - Sub-pages that return 404 or other errors are silently skipped — only successfully loaded pages contribute data.
- Duplicate domains in
startUrlsare collapsed to one record.
Pricing
Pay per website processed. Charged at run start plus a per-record fee when results are returned.