OrbTop

NYC DOB & HPD Violations, Permits & Property Data Scraper

REAL ESTATEBUSINESS

NYC DOB & HPD Violations, Permits & Property Data Scraper

Scrape NYC building violations, permits, and property data from 7 NYC Open Data Socrata datasets, joined on BIN/BBL into a single per-building risk profile.

What This Scraper Does

This actor queries the NYC Open Data portal (data.cityofnewyork.us) across 7 Socrata SODA v2 endpoints and merges them on NYC Building Identification Number (BIN) and Borough-Block-Lot (BBL) to produce a comprehensive building risk profile per address. No rendering or authentication required — pure public JSON API.

Datasets Covered

Dataset Endpoint What It Contains
DOB Violations 3h2n-5cm9 Active/dismissed building violations from the NYC Department of Buildings
ECB Violations 6bgk-3dad Environmental Control Board violations with severity and penalty amounts
HPD Housing Violations wvxf-dwi5 Housing maintenance violations (class A/B/C) from Housing Preservation & Development
DOB Permit Issuance ipu4-2q9a Construction and alteration permits with contractor and owner details
DOB Job Application Filings ic3t-wcy2 Job application filings with cost estimates and work descriptions
PLUTO Property Tax Lots 64uk-42ks Property ownership, year built, floor count, assessed value
LL84 Energy Benchmarking usc3-8zwd Annual energy benchmarking data (site/source EUI, ENERGY STAR score, GHG emissions)

Use Cases

  • PropTech / Lender due diligence: Join violations + permits + ownership into a single building risk profile
  • Real estate underwriting: Identify buildings with active hazardous violations or outstanding ECB penalties
  • Contractor lead generation: Find buildings with expired permits or open violations needing remediation
  • ESG analysis: Pull LL84 energy benchmarking data alongside property profiles
  • Compliance monitoring: Track building violation status changes over time

Input Configuration

Field Type Description
maxItems integer Maximum buildings to return (default: 10)
binList array Explicit BIN list for direct lookup (overrides date-range mode)
bblList array Explicit BBL list — resolved to BINs via LL84 dataset
dateFrom string Start date for violation/permit filter, YYYY-MM-DD (date-range mode)
dateTo string End date for violation/permit filter, YYYY-MM-DD (optional, defaults to today)
datasets array Which datasets to include (leave empty for all 7)
socrataAppToken string Optional Socrata app token for higher rate limits

Operating Modes

BIN lookup mode (recommended for targeted queries): Provide binList with one or more BIN numbers. The actor fetches all matching records across all requested datasets for those BINs.

BBL lookup mode: Provide bblList with 10-digit Borough-Block-Lot strings. The actor resolves BINs via the LL84 dataset then fetches all datasets.

Date-range mode (when neither binList nor bblList are provided): The actor seeds from DOB violations filtered by date range, collects unique BINs, then fans out to all requested datasets. Use maxItems to cap the number of buildings returned.

Output Schema

Each record represents one unique building (BIN). Fields from all datasets are flattened onto the record. When multiple violations or permits exist for a building, only the most recent row is included per dataset (plus composite counts).

Key composite fields:

  • active_violation_count — Total open violations across DOB, ECB, and HPD
  • open_permit_count — Active/issued permits
  • total_penalties_outstanding — Sum of ECB penalty amounts
  • building_risk_score — Composite 0–100 score (weighted by violation count, severity, and penalties)

Rate Limits

The NYC Socrata API allows 1,000 rows/query anonymously. For large runs, provide a Socrata app token (socrataAppToken) via the free registration at data.cityofnewyork.us. The actor uses a 300ms delay between requests to respect the crawl-delay in robots.txt.

Example Output

{
  "bin": "1007611",
  "bbl": "1004990020",
  "boro": "1",
  "house_number": "94",
  "street": "PRINCE STREET",
  "zip": "10012",
  "lat": 40.7245729,
  "lng": -73.9987806,
  "dob_violation_number": "V083183LANDMK84-0109",
  "dob_violation_type": "LANDMK-LANDMARK",
  "dob_violation_category": "V-DOB VIOLATION - ACTIVE",
  "dob_violation_issue_date": "1983-08-31",
  "pluto_owner": "94 PRINCE STREET CORP",
  "pluto_year_built": 1858,
  "pluto_num_floors": 5,
  "pluto_units_total": 3,
  "pluto_building_class": "K4",
  "pluto_assessed_value": 1051200,
  "active_violation_count": 4,
  "open_permit_count": 0,
  "total_penalties_outstanding": 0,
  "building_risk_score": 12
}