Structured Data Validator (JSON-LD / OG)
SEO TOOLSDEVELOPER TOOLSAI
Structured Data Validator Pro (JSON-LD, Open Graph, Schema.org)
Extract and validate structured data from any URL — JSON-LD, Open Graph, Twitter Cards, microdata, RDFa, and meta tags — in one pass. Local schema.org validation, Google rich-result eligibility check, and an AI-discovery readiness score. Pure HTTP, no browser.
Structured Data Validator Features
- Extracts six structured-data formats per URL: JSON-LD, Open Graph, Twitter Cards, microdata, RDFa, and meta tags.
- Validates JSON-LD blocks against a bundled schema.org rule set with required-field gates per type (Article, Recipe, Product, Event, FAQPage, HowTo, VideoObject).
- Flags Google rich-result eligibility — true when any block satisfies the relevant rich-result requirement set.
- Scores AI-discovery readiness on a 0-100 scale, weighted toward the signals LLM crawlers actually use.
- Detects and lists every schema.org
@typefound across all formats. - Optional raw-HTML dump to KVS for offline debugging.
- Pure HTTP fetch via CheerioCrawler — no browser, no proxy by default. The cheap default.
Who Uses Structured Data Audits?
- SEO teams — audit rich-result eligibility across a sitemap before chasing rank changes that turn out to be markup bugs.
- Content engineering — verify JSON-LD blocks ship with every article, product, or recipe page.
- AI / LLM-discovery auditors — score how well a site speaks to AI crawlers, since LLMs lean heavily on structured data.
- Migration QA — diff structured-data coverage before and after a CMS swap or template refactor.
- Competitive research — see exactly which schema.org types competitors mark up, and which ones they miss.
How Structured Data Validator Works
- Pass in a list of URLs. The actor caps at 15 per run by default to stay inside the Apify tester's 5-minute timeout.
- CoreCrawler fetches the static HTML over plain HTTP. The handler runs all six extractors in parallel.
- JSON-LD blocks are validated against the bundled schema.org rule set. Each issue is recorded with severity, path, type, and message.
- The actor flags Google rich-result eligibility and computes the AI-discovery readiness score, then emits one row per URL.
Input
{
"urls": [
"https://schema.org/Article",
"https://www.apify.com"
],
"maxItems": 5,
"extractWhich": ["json-ld", "open-graph", "twitter-cards", "microdata", "rdfa", "meta-tags"],
"validateAgainst": "schema.org",
"includeRawHtml": false
}
| Field | Type | Default | Description |
|---|---|---|---|
urls |
array | required | URLs to extract and validate structured data from. |
maxItems |
integer | 5 | Hard cap on URLs per run. Range 1-15. |
extractWhich |
array | all six | Formats to extract: json-ld, open-graph, twitter-cards, microdata, rdfa, meta-tags. |
validateAgainst |
enum | schema.org |
Validation rule set. schema.org runs the bundled gates; none skips validation. |
includeRawHtml |
boolean | false | Save the fetched HTML to KVS and link via rawHtmlKvsKey on each row. |
proxyConfiguration |
object | none | Optional. Default is no proxy. |
Structured Data Validator Output Fields
{
"url": "https://www.apify.com",
"finalUrl": "https://www.apify.com/",
"jsonLd": [
"{\"@context\":\"https://schema.org\",\"@type\":\"Organization\",\"name\":\"Apify\"}"
],
"openGraph": {
"og:title": "Apify - The Web Scraping Platform",
"og:type": "website",
"og:url": "https://apify.com/",
"og:image": "https://apify.com/img/social.png"
},
"twitterCard": { "twitter:card": "summary_large_image" },
"microdata": [],
"rdfa": [],
"metaTags": { "viewport": "width=device-width, initial-scale=1", "robots": "index, follow" },
"validationErrors": [],
"schemaTypes": ["Organization"],
"googleRichResultEligible": false,
"aiDiscoveryReadiness": {
"hasJsonLd": true,
"hasArticleSchema": false,
"hasFAQ": false,
"hasHowTo": false,
"hasOpenGraph": true,
"score": 60
},
"rawHtmlKvsKey": "",
"status": "success",
"errorMsg": "",
"extractedAt": "2026-04-30T12:00:00Z"
}
| Field | Type | Description |
|---|---|---|
url |
string | Audited URL. |
finalUrl |
string | URL after redirects. |
jsonLd |
array | Parsed JSON-LD blocks as JSON-stringified objects (CSV/Excel safe). |
openGraph |
object | All og:* meta tags flattened into a single object. |
twitterCard |
object | All twitter:* meta tags flattened into a single object. |
microdata |
array | itemscope/itemtype blocks as JSON-stringified objects. |
rdfa |
array | property/typeof/resource blocks as JSON-stringified objects. |
metaTags |
object | All <meta name> and <meta http-equiv> tags as a flat object. |
validationErrors |
array | Issues formatted as <severity> [<path>] (<type>) <message>. |
schemaTypes |
array | Detected schema.org types (e.g. Article, Recipe, Product). |
googleRichResultEligible |
boolean | True when any block satisfies a Google rich-result requirement set. |
aiDiscoveryReadiness |
object | {hasJsonLd, hasArticleSchema, hasFAQ, hasHowTo, hasOpenGraph, score 0-100}. |
rawHtmlKvsKey |
string | KVS key for raw HTML when includeRawHtml=true (else empty). |
status |
string | success, not_found, or error. |
errorMsg |
string | Error message on failure (empty on success). |
extractedAt |
string | ISO timestamp. |
Pricing
Token charge — functionally free. Apify rejects truly $0 PPE events, so the per-record price is the smallest practical floor.
| Event | Price |
|---|---|
| Actor start | $0.10 |
| Per record | $0.0001 |
| Volume | Cost |
|---|---|
| 100 records | $0.11 |
| 1,000 records | $0.20 |
| 10,000 records | $1.10 |
This actor is the cheap discovery primitive that pairs with paid downstream actors. Audit liberally.
Limits
maxItemscaps at 15 per run by default — sized for the Apify tester's 5-minute timeout.- The schema.org validator covers the common Google-rich-result types (Article, Recipe, Product, Event, FAQPage, HowTo, VideoObject). Other types parse but skip required-field validation.
- The actor uses HTTP fetch only. Sites that require JS rendering for structured data won't surface anything — pair with a render crawler upstream.
includeRawHtml=truewrites one KVS entry per URL. KVS quotas apply.- Validation severity is internal —
validationErrorsstrings start witherror,warn, orinfofor downstream filtering.
Related Actors
- Sitemap Walker Pro — feed discovered URLs straight into this validator for site-wide structured-data audits.
- SSL & Security Headers Checker — pair for full SEO + security audits per URL.
- Angular SSR State Extractor — for sites where the structured data lives inside Angular's TransferState payload.
Need More Features?
Need additional schema.org types, custom validation rules, or a render-crawler variant? File an issue or get in touch.
Why Use Structured Data Validator Pro?
- Functionally free — $0.0001 per record. Audit your whole sitemap and barely move the needle.
- Six formats, one pass — JSON-LD, Open Graph, Twitter Cards, microdata, RDFa, and meta tags in a single dataset row. Most tools cover one, maybe two.
- AI-discovery score baked in — rich-result eligibility plus an LLM-readiness score, so you know how the site reads to both Google and Claude.
Built by OrbTop.