OrbTop

Structured Data Validator (JSON-LD / OG)

SEO TOOLSDEVELOPER TOOLSAI

Structured Data Validator Pro (JSON-LD, Open Graph, Schema.org)

Extract and validate structured data from any URL — JSON-LD, Open Graph, Twitter Cards, microdata, RDFa, and meta tags — in one pass. Local schema.org validation, Google rich-result eligibility check, and an AI-discovery readiness score. Pure HTTP, no browser.


Structured Data Validator Features

  • Extracts six structured-data formats per URL: JSON-LD, Open Graph, Twitter Cards, microdata, RDFa, and meta tags.
  • Validates JSON-LD blocks against a bundled schema.org rule set with required-field gates per type (Article, Recipe, Product, Event, FAQPage, HowTo, VideoObject).
  • Flags Google rich-result eligibility — true when any block satisfies the relevant rich-result requirement set.
  • Scores AI-discovery readiness on a 0-100 scale, weighted toward the signals LLM crawlers actually use.
  • Detects and lists every schema.org @type found across all formats.
  • Optional raw-HTML dump to KVS for offline debugging.
  • Pure HTTP fetch via CheerioCrawler — no browser, no proxy by default. The cheap default.

Who Uses Structured Data Audits?

  • SEO teams — audit rich-result eligibility across a sitemap before chasing rank changes that turn out to be markup bugs.
  • Content engineering — verify JSON-LD blocks ship with every article, product, or recipe page.
  • AI / LLM-discovery auditors — score how well a site speaks to AI crawlers, since LLMs lean heavily on structured data.
  • Migration QA — diff structured-data coverage before and after a CMS swap or template refactor.
  • Competitive research — see exactly which schema.org types competitors mark up, and which ones they miss.

How Structured Data Validator Works

  1. Pass in a list of URLs. The actor caps at 15 per run by default to stay inside the Apify tester's 5-minute timeout.
  2. CoreCrawler fetches the static HTML over plain HTTP. The handler runs all six extractors in parallel.
  3. JSON-LD blocks are validated against the bundled schema.org rule set. Each issue is recorded with severity, path, type, and message.
  4. The actor flags Google rich-result eligibility and computes the AI-discovery readiness score, then emits one row per URL.

Input

{
  "urls": [
    "https://schema.org/Article",
    "https://www.apify.com"
  ],
  "maxItems": 5,
  "extractWhich": ["json-ld", "open-graph", "twitter-cards", "microdata", "rdfa", "meta-tags"],
  "validateAgainst": "schema.org",
  "includeRawHtml": false
}
Field Type Default Description
urls array required URLs to extract and validate structured data from.
maxItems integer 5 Hard cap on URLs per run. Range 1-15.
extractWhich array all six Formats to extract: json-ld, open-graph, twitter-cards, microdata, rdfa, meta-tags.
validateAgainst enum schema.org Validation rule set. schema.org runs the bundled gates; none skips validation.
includeRawHtml boolean false Save the fetched HTML to KVS and link via rawHtmlKvsKey on each row.
proxyConfiguration object none Optional. Default is no proxy.

Structured Data Validator Output Fields

{
  "url": "https://www.apify.com",
  "finalUrl": "https://www.apify.com/",
  "jsonLd": [
    "{\"@context\":\"https://schema.org\",\"@type\":\"Organization\",\"name\":\"Apify\"}"
  ],
  "openGraph": {
    "og:title": "Apify - The Web Scraping Platform",
    "og:type": "website",
    "og:url": "https://apify.com/",
    "og:image": "https://apify.com/img/social.png"
  },
  "twitterCard": { "twitter:card": "summary_large_image" },
  "microdata": [],
  "rdfa": [],
  "metaTags": { "viewport": "width=device-width, initial-scale=1", "robots": "index, follow" },
  "validationErrors": [],
  "schemaTypes": ["Organization"],
  "googleRichResultEligible": false,
  "aiDiscoveryReadiness": {
    "hasJsonLd": true,
    "hasArticleSchema": false,
    "hasFAQ": false,
    "hasHowTo": false,
    "hasOpenGraph": true,
    "score": 60
  },
  "rawHtmlKvsKey": "",
  "status": "success",
  "errorMsg": "",
  "extractedAt": "2026-04-30T12:00:00Z"
}
Field Type Description
url string Audited URL.
finalUrl string URL after redirects.
jsonLd array Parsed JSON-LD blocks as JSON-stringified objects (CSV/Excel safe).
openGraph object All og:* meta tags flattened into a single object.
twitterCard object All twitter:* meta tags flattened into a single object.
microdata array itemscope/itemtype blocks as JSON-stringified objects.
rdfa array property/typeof/resource blocks as JSON-stringified objects.
metaTags object All <meta name> and <meta http-equiv> tags as a flat object.
validationErrors array Issues formatted as <severity> [<path>] (<type>) <message>.
schemaTypes array Detected schema.org types (e.g. Article, Recipe, Product).
googleRichResultEligible boolean True when any block satisfies a Google rich-result requirement set.
aiDiscoveryReadiness object {hasJsonLd, hasArticleSchema, hasFAQ, hasHowTo, hasOpenGraph, score 0-100}.
rawHtmlKvsKey string KVS key for raw HTML when includeRawHtml=true (else empty).
status string success, not_found, or error.
errorMsg string Error message on failure (empty on success).
extractedAt string ISO timestamp.

Pricing

Token charge — functionally free. Apify rejects truly $0 PPE events, so the per-record price is the smallest practical floor.

Event Price
Actor start $0.10
Per record $0.0001
Volume Cost
100 records $0.11
1,000 records $0.20
10,000 records $1.10

This actor is the cheap discovery primitive that pairs with paid downstream actors. Audit liberally.


Limits

  • maxItems caps at 15 per run by default — sized for the Apify tester's 5-minute timeout.
  • The schema.org validator covers the common Google-rich-result types (Article, Recipe, Product, Event, FAQPage, HowTo, VideoObject). Other types parse but skip required-field validation.
  • The actor uses HTTP fetch only. Sites that require JS rendering for structured data won't surface anything — pair with a render crawler upstream.
  • includeRawHtml=true writes one KVS entry per URL. KVS quotas apply.
  • Validation severity is internal — validationErrors strings start with error, warn, or info for downstream filtering.

Related Actors

  • Sitemap Walker Pro — feed discovered URLs straight into this validator for site-wide structured-data audits.
  • SSL & Security Headers Checker — pair for full SEO + security audits per URL.
  • Angular SSR State Extractor — for sites where the structured data lives inside Angular's TransferState payload.

Need More Features?

Need additional schema.org types, custom validation rules, or a render-crawler variant? File an issue or get in touch.

Why Use Structured Data Validator Pro?

  • Functionally free — $0.0001 per record. Audit your whole sitemap and barely move the needle.
  • Six formats, one pass — JSON-LD, Open Graph, Twitter Cards, microdata, RDFa, and meta tags in a single dataset row. Most tools cover one, maybe two.
  • AI-discovery score baked in — rich-result eligibility plus an LLM-readiness score, so you know how the site reads to both Google and Claude.

Built by OrbTop.