OrbTop

Aijobs.net AI & ML Job Listings Scraper

JOBSLEAD GENERATION

Aijobs.net AI & ML Job Listings Scraper

Scrape AI, ML, data science, and related engineering job listings from aijobs.net — the go-to single-source job board for the AI industry. Extracts complete job records including salary range, seniority, remote policy, tech stack tags, company info, and apply URL. Sitemap-driven for complete, reproducible coverage of the full job inventory.

What data does it extract?

Each record includes:

Field Description
job_id Numeric job ID
job_slug URL slug for the job
title Job title
company_name Hiring company
company_profile_url Company page on aijobs.net
company_logo_url Company logo URL
location_city City or region
location_country Country
is_remote Boolean — remote-eligible
remote_policy remote / hybrid / onsite
remote_region_restriction US-only, EU-only, global, or null
employment_type full-time / contract / part-time
seniority entry / mid / senior / staff / principal
salary_min_usd Minimum salary (USD, annualized)
salary_max_usd Maximum salary (USD, annualized)
salary_currency Currency code (USD, AUD, GBP, etc.)
salary_raw Raw salary string from the page
equity_offered yes if equity/stock mentioned
posted_at Approximate posting date (ISO date)
role_category Inferred: ai-engineer, ml-engineer, data-scientist, research, mlops, data-engineer, other
tech_tags Comma-separated tech skills (PyTorch, LangChain, RAG, etc.)
description_markdown Full job description text
apply_url Direct apply link
is_featured Boolean — featured listing
profile_url Canonical job page URL
scraped_at Scrape timestamp (ISO)

How does it work?

The actor walks the aijobs.net/jobs-sitemap.xml sitemap (~48,000 job URLs) to discover job detail pages. Each job page is scraped via HTTP (no browser, no CAPTCHA — the site is publicly accessible). Extraction is done via cheerio HTML selectors targeting the site's Bootstrap badge elements, link anchors, and text nodes.

The maxItems input controls how many records to collect. Set to 0 for a full run (all ~48k jobs).

Input

{
  "maxItems": 100
}
Parameter Type Default Description
maxItems integer 10 Maximum number of job records to collect. Set to 0 for unlimited.

Use cases

  • AI talent market research — track demand across roles (AI engineer, ML engineer, MLOps, research scientist) over time
  • Salary benchmarking — compare compensation by role, seniority, and region
  • RecOps & ATS enrichment — bulk-import active job listings for sourcing workflows
  • Remote work signal monitoring — filter by remote_policy and remote_region_restriction for location-agnostic hiring intelligence
  • Tech stack trend analysis — aggregate tech_tags to identify the fastest-rising skills (LangChain, RAG, Mamba, etc.)

Notes

  • Salary values are annualized. Non-USD salaries (AUD, GBP, EUR) preserve the original currency in salary_currency and the raw string in salary_raw. USD fields are null for non-USD postings.
  • posted_at is estimated from the "Published Xd ago" label on each job page. Accurate to the day for recent posts.
  • Expired jobs (Apply button shows "Expired") are included in the sitemap and scraped; their apply_url will reflect the job detail URL with an expired state.