OrbTop

Bulk Address Parser & Normalizer (US / CA)

LEAD GENERATIONAUTOMATIONDEVELOPER TOOLS

Bulk Address Parser & Normalizer (US / CA / Cayman)

Parse free-form address strings into structured {street, city, state, zip, country} records. Sixteen parse patterns cover US, Canadian, and Cayman addresses, with optional Nominatim geocode and embedded phone normalisation.


Address Parser Features

  • Sixteen parse patterns — US standard, no-comma, multi-location, state-name, state-code, PO Box, unit prefix / suffix, suite, directional, Canadian standard, Canadian postal, Cayman, flex-zip, and a regex fallback.
  • State helpers — full name to two-letter code to URL slug, both directions.
  • PO Box, unit, suite, and directional detection out of the box.
  • Optional OpenStreetMap Nominatim geocode adds {lat, lon, displayName}. Self-host the endpoint when you need more than 1 req/sec.
  • Optional phone normaliser detects an embedded phone number and emits it in canonical form.
  • Pure CPU on the parse path. Geocoded rows trigger a separate premium event so you only pay for what you geocode.

Who Uses Address Parser Data?

  • CRM / sales-ops teams — normalise free-form addresses before deduping. Catches the records that look identical until you read them carefully.
  • Lead enrichment pipelines — promote text into typed fields: state, stateName, lat/lon, normalised phone.
  • Dataset prep engineers — turn scraped seller, agent, or vendor blobs into clean structured rows for warehouse loads.
  • Form validation backends — run user-typed addresses through real-world parser logic instead of a regex you'll regret.
  • Real estate and logistics — normalise property addresses across MLS exports, county records, and broker CSVs.

How Address Parser Works

  1. Pass in a list of free-form address strings. Country defaults to US; pass defaultCountry for CA or KY.
  2. Each string runs through the AddressManager pattern ladder. The first pattern that matches wins; the matched label lands in patternMatched.
  3. If includePhone is on, the actor also scans the raw blob for a phone-shaped substring and normalises it.
  4. If geocode is on and the parse succeeded, the actor hits Nominatim (1 req/sec by default) and adds lat, lon, and displayName.

Input

{
  "addresses": [
    "123 Main St, Springfield, IL 62701",
    "500 University Ave, Toronto, ON M5G 1V7",
    "PO Box 1234, George Town, KY1-1107"
  ],
  "defaultCountry": "US",
  "geocode": false,
  "returnUnparseable": true,
  "includePhone": false,
  "maxItems": 15
}
Field Type Default Description
addresses array required Free-form address strings to parse and normalise.
defaultCountry enum US Country fallback when the parser cannot detect from input. US, CA, or KY.
geocode boolean false Enable Nominatim lookup. Adds 1 req/sec rate limit. Premium event when a hit lands.
returnUnparseable boolean true Include rows that failed to parse. When false, only valid=true rows are emitted.
includePhone boolean false Detect and normalise embedded phone numbers, emit phoneNormalized.
nominatimEndpoint string OSM default BYO Nominatim host. Required when geocode=true and you need more than 1 req/sec.
maxItems integer 15 Hard cap on addresses processed per run.

Geocode + phone example

{
  "addresses": ["Acme Corp, 123 Main St, Suite 100, Springfield, IL 62701, (415) 555-1234"],
  "defaultCountry": "US",
  "geocode": true,
  "includePhone": true,
  "maxItems": 10
}

Address Parser Output Fields

{
  "raw": "123 Main St, Suite 100, Springfield, IL 62701",
  "parsed": {
    "street": "123 Main St Suite 100",
    "city": "Springfield",
    "state": "IL",
    "stateName": "Illinois",
    "zip": "62701",
    "country": "US"
  },
  "valid": true,
  "patternMatched": "us-multi-location",
  "geo": "{\"lat\":39.7817,\"lon\":-89.6501,\"displayName\":\"Springfield, ...\"}",
  "phoneNormalized": null,
  "country": "US",
  "normalizedAt": "2026-04-30T12:00:00Z",
  "status": "success",
  "errorMsg": null
}
Field Type Description
raw string The original input address string.
parsed object {street, city, state, stateName, zip, country}. Null when valid=false.
valid boolean True when the minimum required fields (city, state, zip) were parsed.
patternMatched string Which AddressManager pattern fired (e.g. us-standard, canadian-postal, fallback).
geo string JSON string {lat, lon, displayName} when geocode=true; null otherwise.
phoneNormalized string Normalised phone (when includePhone=true and a number is present).
country string ISO2 country code (US, CA, KY).
normalizedAt string ISO timestamp when the row was processed.
status string success, unparseable, or error.
errorMsg string Error message when status=error; null on success.

Pricing

Two events. Pure-CPU parses are cheap. Geocoded rows trigger a separate premium event because Nominatim adds an HTTP round-trip per record.

Event Price
Actor start $0.10
Per parsed address $0.0005
Per geocoded address $0.001
Volume No geocode Geocoded
100 addresses $0.15 $0.20
1,000 addresses $0.60 $1.10
10,000 addresses $5.10 $10.10

Limits

  • maxItems caps the number of addresses processed per run. Override the schema default of 15 for production batches.
  • The Apify console tester has a 5-minute timeout — pure-CPU parses are well clear of that, but geocode mode is rate-limited.
  • Nominatim's public endpoint enforces 1 req/sec. Geocode mode therefore caps at roughly 3,500 addresses per 1-hour run on the default endpoint. Self-host or BYO via nominatimEndpoint for higher throughput.
  • Country detection covers US, Canada, and Cayman. Other ISO regions fall through to the regex fallback and may emit valid=false.
  • Phone normaliser is best-effort — it expects North American formats. Numbers that don't match the regex are silently skipped.

Related Actors

  • DNS Domain Audit — pair with address parser when enriching contact records that include both addresses and email domains.
  • Structured Data Validator Pro — for parsing addresses out of HTML before normalising them.
  • SSL & Security Headers Checker — same utility-actor shape for site-health workflows.

Need More Features?

Need extra countries, alternate state-helper outputs, or a different geocode backend? File an issue or get in touch.

Why Use Address Parser?

  • Cheap on the hot path — $0.0005 per parsed row. Cleaning a million-record CRM costs less than the meeting where you'd discuss it.
  • Sixteen patterns, one row out — the parser handles the realistic mess of US / Canadian / Cayman addresses and tells you which pattern fired, so unparseable rows are easy to triage.
  • Geocode is opt-in and pay-per-hit — Nominatim only fires when you ask for it, and only successful geocodes bill at the premium rate.

Built by OrbTop.