OrbTop

CA Data Breach Notification Scraper

BUSINESSOTHER

CA Data Breach Notification Scraper

Scrapes the California Attorney General's SB 24 Data Breach Notification Registry — the authoritative public feed of California data breach notices under Cal. Civ. Code § 1798.82. Over 5,000 breach filings since 2012, updated continuously as companies file new SB 24 notices.

What it does

The AG breach list is a server-rendered Drupal Views table. All records are delivered in a single HTML response — no pagination, no JavaScript required. This actor:

  1. Fetches the listing page and extracts all breach records (organization, breach date(s), reported date, report URL)
  2. Optionally enriches each record by fetching the individual report detail page for the consumer notification letter text and the sample notice PDF link

Output fields

Field Description
organization_name Company or entity that filed the breach notification
breach_dates Date(s) of the breach — comma-separated when multiple dates reported
reported_date Date the notice was posted to the AG list
report_url Full URL to the report detail page on oag.ca.gov
report_id SB 24 report identifier (e.g. sb24-625166)
notice_letter_text Consumer notification letter body text (requires fetch_details: true)
sample_notice_pdf_url URL to the sample notice PDF (requires fetch_details: true)
affected_individuals Number of affected individuals, derived from notice text
breach_type Derived keyword: ransomware, phishing, vendor, unauthorized_access, etc.
first_seen ISO-8601 timestamp when this record was first scraped

Input options

Parameter Type Default Description
maxItems integer 10 Maximum number of breach records to return. Leave empty for all records (~5,000+)
fetch_details boolean false When true, fetches each report's detail page for notice text and PDF links

Use cases

  • Cyber-insurance underwriting — weekly delta runs to identify newly reported breaches for risk modeling
  • Breach litigation — CA is the #1 breach class-action jurisdiction; plaintiff/defense firms track new filings
  • Threat intelligence — identify breach patterns by type (ransomware, phishing, vendor) and affected sector
  • Compliance monitoring — enterprises monitoring whether their vendors have filed breach notices
  • Journalism & research — structured access to the complete regulatory breach history

Scheduling

The CA AG list updates whenever a new SB 24 notice is filed (typically several per week). Recommended run cadence for delta monitoring: daily or weekly. Use the first_seen field as a change-tracking cursor — filter for records where first_seen is after your last run timestamp.

Technical notes

  • No proxy required — CA gov site is datacenter-accessible with no anti-bot protection
  • Full listing response is ~3.4 MB (5,000+ rows in one HTML page)
  • Detail fetch mode runs at concurrency 5 with rate limiting enabled
  • Memory: 512 MB (sufficient for the full listing parse)