OrbTop

Law Firm Website Contact Scraper

LEAD GENERATION

Law Firm Website Contact Scraper

Extract attorney profiles, contact info, practice areas, education, and bios directly from law firm websites. Provide a list of law firm website URLs and get structured attorney data ready for CRM import, lead generation, or legal directory enrichment.

What It Does

This actor crawls law firm websites and extracts detailed attorney profiles from each attorney's bio page. It works with virtually any law firm website architecture — WordPress, custom CMS, React/Next.js server-rendered sites — and uses a multi-layer extraction approach:

  1. JSON-LD / schema.org/Person — detects and parses structured attorney data when available (name, jobTitle, email, telephone, image, LinkedIn)
  2. Heuristic CSS selectors — covers common WordPress attorney theme patterns and major law firm CMS templates
  3. Pattern matching fallbacks — extracts mailto links, tel: links, and office location from visible page content

Use Cases

  • Lead generation — build prospect lists of attorneys at target firms
  • Directory enrichment — supplement FindLaw, Martindale, or Avvo data with direct-from-source bios and contact details
  • Recruitment — identify attorneys by practice area and location for headhunting
  • Market research — map firm headcount, practice area mix, and office locations
  • CRM import — import attorney records with email, phone, title, and LinkedIn directly

Input

Field Description
urls Required. List of law firm website URLs. Can be the homepage (actor navigates to the attorneys/team page) or the attorneys listing page directly (e.g. https://www.example.com/lawyers/).
maxItems Maximum number of attorney records to return across all input URLs. Default: 10. Set to 0 for no limit.

Example Input

{
  "urls": [
    "https://www.gibsondunn.com/lawyers/",
    "https://www.lw.com/en/people"
  ],
  "maxItems": 100
}

Output

Each record represents one attorney:

Field Description
attorney_name Full name
title Professional title (Partner, Associate, Of Counsel, etc.)
email Email address
phone Primary phone number
direct_phone Direct line (if separate from main)
practice_areas Practice areas, pipe-separated
education Educational background, pipe-separated
bar_admissions Bar admissions, pipe-separated
bio Biography text (up to 2,000 characters)
firm_name Law firm name
office_location Office city/location
attorney_page_url URL of the attorney bio page
headshot_url URL of the attorney headshot image
linkedin_url LinkedIn profile URL

Example Output

{
  "attorney_name": "Jane Smith",
  "title": "Partner",
  "email": "jsmith@examplelaw.com",
  "phone": "+1 212.555.1234",
  "direct_phone": null,
  "practice_areas": "Mergers & Acquisitions | Private Equity | Capital Markets",
  "education": "Harvard Law School, J.D. | Yale University, B.A.",
  "bar_admissions": "New York | California",
  "bio": "Jane Smith is a partner in the firm's M&A practice...",
  "firm_name": "Example Law Firm",
  "office_location": "New York",
  "attorney_page_url": "https://www.examplelaw.com/people/jane-smith/",
  "headshot_url": "https://www.examplelaw.com/wp-content/uploads/jane-smith.jpg",
  "linkedin_url": "https://www.linkedin.com/in/janesmith/"
}

How It Works

The actor uses a two-level hierarchical crawl:

  1. Home/Listing detection — if the input URL is a homepage, the actor navigates to the attorneys listing page via site navigation links. If the URL is already a listing page (contains /people/, /lawyers/, /attorneys/, etc.), it skips directly to discovery.
  2. Attorney discovery — scans the listing page for links to individual bio pages and handles pagination for large firms.
  3. Bio page extraction — visits each bio page and extracts the full attorney profile using the extraction cascade described above.

Site Compatibility

Tested with major law firm website architectures:

  • WordPress attorney themes (most common)
  • Custom CMS (large firms with bespoke systems)
  • Server-rendered React/Next.js (works without browser rendering)

Sites requiring JavaScript-only rendering may return incomplete data for some fields.

Notes

  • Data quality depends on how well the target site uses schema.org/Person markup. Sites with JSON-LD yield the richest data.
  • The bio field is truncated at 2,000 characters.
  • Array fields (practice areas, education, bar admissions) are returned as pipe-separated strings for easy spreadsheet import.
  • Email and phone extraction relies on mailto: and tel: HTML links — firms displaying contact info as plain text or images may not have these populated.