Law Firm Website Contact Scraper

Extract attorney profiles, contact info, practice areas, education, and bios directly from law firm websites. Provide a list of law firm website URLs and get structured attorney data ready for CRM import, lead generation, or legal directory enrichment.

What It Does

This actor crawls law firm websites and extracts detailed attorney profiles from each attorney's bio page. It works with virtually any law firm website architecture — WordPress, custom CMS, React/Next.js server-rendered sites — and uses a multi-layer extraction approach:

JSON-LD / schema.org/Person — detects and parses structured attorney data when available (name, jobTitle, email, telephone, image, LinkedIn)
Heuristic CSS selectors — covers common WordPress attorney theme patterns and major law firm CMS templates
Pattern matching fallbacks — extracts mailto links, tel: links, and office location from visible page content

Use Cases

Lead generation — build prospect lists of attorneys at target firms
Directory enrichment — supplement FindLaw, Martindale, or Avvo data with direct-from-source bios and contact details
Recruitment — identify attorneys by practice area and location for headhunting
Market research — map firm headcount, practice area mix, and office locations
CRM import — import attorney records with email, phone, title, and LinkedIn directly

Input

Field	Description
`urls`	Required. List of law firm website URLs. Can be the homepage (actor navigates to the attorneys/team page) or the attorneys listing page directly (e.g. `https://www.example.com/lawyers/`).
`maxItems`	Maximum number of attorney records to return across all input URLs. Default: 10. Set to 0 for no limit.

Example Input

{
  "urls": [
    "https://www.gibsondunn.com/lawyers/",
    "https://www.lw.com/en/people"
  ],
  "maxItems": 100
}

Output

Each record represents one attorney:

Field	Description
`attorney_name`	Full name
`title`	Professional title (Partner, Associate, Of Counsel, etc.)
`email`	Email address
`phone`	Primary phone number
`direct_phone`	Direct line (if separate from main)
`practice_areas`	Practice areas, pipe-separated
`education`	Educational background, pipe-separated
`bar_admissions`	Bar admissions, pipe-separated
`bio`	Biography text (up to 2,000 characters)
`firm_name`	Law firm name
`office_location`	Office city/location
`attorney_page_url`	URL of the attorney bio page
`headshot_url`	URL of the attorney headshot image
`linkedin_url`	LinkedIn profile URL

Example Output

{
  "attorney_name": "Jane Smith",
  "title": "Partner",
  "email": "jsmith@examplelaw.com",
  "phone": "+1 212.555.1234",
  "direct_phone": null,
  "practice_areas": "Mergers & Acquisitions | Private Equity | Capital Markets",
  "education": "Harvard Law School, J.D. | Yale University, B.A.",
  "bar_admissions": "New York | California",
  "bio": "Jane Smith is a partner in the firm's M&A practice...",
  "firm_name": "Example Law Firm",
  "office_location": "New York",
  "attorney_page_url": "https://www.examplelaw.com/people/jane-smith/",
  "headshot_url": "https://www.examplelaw.com/wp-content/uploads/jane-smith.jpg",
  "linkedin_url": "https://www.linkedin.com/in/janesmith/"
}

How It Works

The actor uses a two-level hierarchical crawl:

Home/Listing detection — if the input URL is a homepage, the actor navigates to the attorneys listing page via site navigation links. If the URL is already a listing page (contains /people/, /lawyers/, /attorneys/, etc.), it skips directly to discovery.
Attorney discovery — scans the listing page for links to individual bio pages and handles pagination for large firms.
Bio page extraction — visits each bio page and extracts the full attorney profile using the extraction cascade described above.

Site Compatibility

Tested with major law firm website architectures:

WordPress attorney themes (most common)
Custom CMS (large firms with bespoke systems)
Server-rendered React/Next.js (works without browser rendering)

Sites requiring JavaScript-only rendering may return incomplete data for some fields.

Notes

Data quality depends on how well the target site uses schema.org/Person markup. Sites with JSON-LD yield the richest data.
The bio field is truncated at 2,000 characters.
Array fields (practice areas, education, bar admissions) are returned as pipe-separated strings for easy spreadsheet import.
Email and phone extraction relies on mailto: and tel: HTML links — firms displaying contact info as plain text or images may not have these populated.

Law Firm Website Contact Scraper

Law Firm Website Contact Scraper

What It Does

Use Cases

Input

Example Input

Output

Example Output

How It Works

Site Compatibility

Notes

Related Legal scrapers