OrbTop

arXiv Scraper

AIDEVELOPER TOOLS

arXiv Scraper

Export preprints and papers from arXiv.org — the leading open-access repository for 2.5 million+ scientific papers across physics, mathematics, computer science, biology, economics, and quantitative finance.

This actor queries the official ArXiv Atom API (export.arxiv.org/api/query) — the method ArXiv officially supports for programmatic data access. No scraping, no JavaScript rendering, no account required.

What you get

Each result includes:

  • arxiv_id — the canonical short ID (e.g. 2301.12345)
  • abs_url — link to the abstract page
  • pdf_url — direct PDF download link
  • title — full paper title
  • abstract — complete abstract / summary
  • authors — comma-separated author names
  • primary_category — primary subject category (e.g. cs.AI)
  • categories — all subject categories, comma-separated
  • published — original submission date (ISO 8601)
  • updated — date of the latest version
  • comment — author notes (page count, conference, etc.) if available

Search query syntax

The searchQuery field supports ArXiv's full query language:

Pattern Example Meaning
Plain keyword machine learning Full-text search
Title ti:attention Papers with "attention" in the title
Author au:Hinton Papers by Hinton
Abstract abs:transformer Papers with "transformer" in abstract
Category cat:cs.AI Papers in the cs.AI category
Boolean cat:cs.LG AND ti:diffusion Category AND title filter
Date range submittedDate:[202301010000 TO 202312312359] Papers from 2023

See the ArXiv query language reference for the full syntax.

Common arXiv categories

Category Field
cs.AI Artificial Intelligence
cs.LG Machine Learning
cs.CL Computation and Language (NLP)
cs.CV Computer Vision
physics.hep-th High Energy Physics Theory
math.CO Combinatorics
q-bio.NC Neurons and Cognition
econ.GN General Economics

Input parameters

Parameter Type Default Description
searchQuery string required ArXiv query expression
maxItems integer 50 Maximum number of papers to return
sortBy string submittedDate Sort field: relevance, lastUpdatedDate, submittedDate
sortOrder string descending ascending or descending

Usage examples

Fetch the 100 most recent cs.AI papers:

{
  "searchQuery": "cat:cs.AI",
  "maxItems": 100,
  "sortBy": "submittedDate",
  "sortOrder": "descending"
}

Find papers by a specific author:

{
  "searchQuery": "au:LeCun",
  "maxItems": 50,
  "sortBy": "relevance"
}

Search for diffusion model papers from 2024:

{
  "searchQuery": "ti:diffusion AND submittedDate:[202401010000 TO 202412312359]",
  "maxItems": 200
}

Technical notes

  • Uses the ArXiv Atom API — ArXiv's official programmatic interface
  • Pagination is handled automatically; set maxItems to any number
  • Rate-limited to ~1 request/second per ArXiv usage guidelines
  • No authentication required
  • Results span all of arXiv's subject areas (2.5M+ papers total)