OrbTop

SCOTUS Oyez Oral Arguments Scraper

EDUCATIONOTHERAUTOMATION

SCOTUS Oyez Oral Arguments Scraper

Scrape Supreme Court oral argument transcripts, case metadata, and speaker-attributed speech from Oyez (the IIT Chicago-Kent canonical SCOTUS index). Covers 5,000+ cases dating back to 1955 with full transcripts showing justice and advocate turns, audio links, decisions, votes, and advocate details.

What You Get

Each output record contains:

  • Case metadata: docket number, case name, parties, term year, lower court, manner of jurisdiction, citation
  • Dates: date argued and date decided (ISO 8601)
  • People: which justices heard the case, which decided it, and all advocates with their roles
  • Decisions: majority/dissenting vote counts, winning party, decision type
  • Oral argument sessions: title, audio MP3 URL, duration (when fetchTranscripts is enabled)
  • Full transcripts: speaker-attributed segments with justice/advocate name, role, spoken text, and start/end timestamps in seconds
  • Opinion announcements: post-decision audio and transcripts
  • Written opinions: author, type (majority, concurring, dissenting), Justia link
  • Case summaries: facts of the case, conclusion (HTML-stripped)
  • Cross-references: Justia URL, Oyez canonical URL, Oyez API URL

Input Options

Field Description Default
termStart First SCOTUS term year to scrape (e.g. 2020) 2020
termEnd Last SCOTUS term year (inclusive). Omit to scrape only termStart 2023
docketNumber Scrape a single case by docket number (e.g. 19-1392). Requires termStart.
caseName Filter cases whose name contains this string (case-insensitive, e.g. Dobbs)
fetchTranscripts Fetch full speaker-attributed transcripts for each oral argument true
maxItems Maximum case records to return (0 = unlimited) 15

Example: Single Case with Transcript

{
  "termStart": 2021,
  "docketNumber": "19-1392",
  "fetchTranscripts": true,
  "maxItems": 1
}

Returns Dobbs v. Jackson Women's Health Organization with the full Oyez-normalized transcript, audio link, advocate list, and decision metadata.

Example: Full Recent Term (No Transcripts)

{
  "termStart": 2022,
  "termEnd": 2022,
  "fetchTranscripts": false,
  "maxItems": 0
}

Returns all ~62 October 2022 Term cases with metadata, advocate lists, and decision records — no transcript audio fetches.

Example: Dobbs + Bruen (Named Case Search)

{
  "termStart": 2021,
  "termEnd": 2021,
  "caseName": "New York",
  "fetchTranscripts": true,
  "maxItems": 5
}

Transcript Format

When fetchTranscripts is enabled, oral_argument_transcript contains a JSON-serialized array:

[
  {
    "speaker": "John G. Roberts, Jr.",
    "role": "Chief Justice of the United States",
    "text": "We will hear argument this morning in Case 19-1392...",
    "start_sec": 0.08,
    "end_sec": 9.2,
    "section": 0
  },
  {
    "speaker": "Scott Stewart",
    "role": "Counsel for Petitioner",
    "text": "Mr. Chief Justice, and may it please the Court...",
    "start_sec": 9.2,
    "end_sec": 45.1,
    "section": 0
  }
]

Audio MP3 links are in oral_argument_sessions[].media_url.

Data Source & Attribution

Data is sourced from the Oyez Project, operated by IIT Chicago-Kent College of Law. Oyez's normalized case metadata and annotation work (justice attribution, summaries) is licensed CC BY-NC 4.0 — commercial redistribution of Oyez's annotated content requires attribution. The underlying oral argument audio and verbatim transcripts are US federal government works and are public domain.

Performance

  • Rate-limited to 2 requests/second out of respect for Oyez's non-profit infrastructure
  • Memory: 256 MB (no browser)
  • Typical run: ~1-2 seconds per case without transcripts; ~3-5 seconds per case with transcripts (one additional API call per oral argument session)

Why Oyez?

Oyez is the only public source with speaker-attributed SCOTUS oral argument transcripts. supremecourt.gov publishes raw transcript PDFs but without justice attribution or audio alignment. Oyez normalizes these into a structured JSON API with justice/advocate identification, making it the canonical source for constitutional law research, legal-tech RAG pipelines, and AI training datasets.