OrbTop

LAVCA News, Deals & Features Scraper

BUSINESSNEWS

LAVCA News, Deals & Features Scraper

Extract LAVCA's full 6,400+ post content archive — deal cases, member profiles, newsletters, and industry news — via the public WordPress REST API. Every record ships with resolved author and category names plus light NER extraction of company, investor, and LATAM country mentions from the post body.


What does this actor do?

LAVCA (Latin American Venture Capital Association) is the authoritative research and advocacy body for private capital in Latin America. Their content archive spans 15+ years of deal cases, member profiles, entrepreneur spotlights, newsletters, and weekly industry news — the gold-standard LATAM-VC narrative dataset.

This actor queries LAVCA's public WordPress REST API, paginates through the full post catalog, and returns structured records with:

  • Full post metadata (title, slug, dates, permalink, excerpt, full HTML body)
  • Resolved author name and category labels
  • Yoast SEO JSON block (canonical URL, OG tags, article schema)
  • Light named-entity extraction: companies mentioned, investor/fund names, and LATAM country mentions detected in the body text

Who needs this data?

  • LATAM expansion consultants — map deal flow and M&A activity by country and sector
  • Regional fund analysts — track portfolio companies and co-investors across deal cases
  • Academic researchers — longitudinal analysis of LATAM venture capital narrative trends
  • Competitive intelligence teams — monitor LAVCA's coverage of specific funds and portfolio companies

Input configuration

Field Type Default Description
maxItems integer 10 Maximum records to return. Leave blank for the full archive (~6,400 posts).
categories array ["industry-news"] Filter by category slug(s). Valid values: industry-news, lavca-in-the-news, newsletters, press-releases, private-capital-update, venture-bulletin. Leave empty for all.
dateFrom string Filter posts published on or after this date (YYYY-MM-DD).
dateTo string Filter posts published on or before this date (YYYY-MM-DD).

Output schema

Each record in the dataset:

Field Type Description
post_id integer WordPress post ID
title string Post title (HTML stripped)
slug string URL slug
date_published string Publication timestamp (ISO 8601)
date_modified string Last-modified timestamp (ISO 8601)
permalink string Canonical post URL
excerpt string Short excerpt (HTML stripped)
content_html string Full post body as raw HTML
author_id integer WordPress author user ID
author_name string Resolved author display name
categories array Category names (e.g. ["Industry News"])
tags array Tag IDs (LAVCA does not use tags)
featured_media_url string URL to the featured image media endpoint, or empty string
yoast_schema_jsonld string Yoast SEO head JSON block serialized as a string
companies_mentioned array Company names extracted from the post body
investor_funds_mentioned array Investor/fund names extracted from the post body
countries_mentioned array LATAM country names detected in the post body

Example output

{
  "post_id": 32274,
  "title": "Explorador Capital, Terra Oil Investments, Amos Global Energy and Others Acquire Hydrocarbon Concessions in Santa Cruz",
  "slug": "explorador-capital-terra-oil-...",
  "date_published": "2026-05-21T20:32:48",
  "date_modified": "2026-05-21T20:32:48",
  "permalink": "https://www.lavca.org/explorador-capital-...",
  "excerpt": "Explorador Capital, Terra Oil Investments, Amos Global Energy and others acquired hydrocarbon concessions in Argentina.",
  "content_html": "<p>...</p>",
  "author_id": 309,
  "author_name": "Vicki Jacobson",
  "categories": ["Industry News"],
  "tags": [],
  "featured_media_url": "",
  "yoast_schema_jsonld": "{\"title\":\"...\",\"og_type\":\"article\",...}",
  "companies_mentioned": [],
  "investor_funds_mentioned": ["Explorador Capital", "Terra Oil Investments"],
  "countries_mentioned": ["Argentina"]
}

Usage notes

  • A full archive run (~6,400 posts) takes approximately 15–20 minutes and uses 512 MB memory.
  • The categories filter maps slugs to WP category IDs at runtime — any unrecognised slug fetches all posts instead.
  • featured_media_url returns the WP media API endpoint URL (not a direct image URL) when a featured image exists.
  • NER patterns target fund/company names with common corporate suffixes and an explicit LATAM country list. They may miss uncommon name formats.