OrbTop

TokyoDev Scraper - Japan Tech Job Listings & Companies

JOBSLEAD GENERATION

TokyoDev Job & Company Scraper

Scrapes tech job listings and company profiles from TokyoDev.com, the primary English-language job board for developers targeting Japan's tech industry. Returns jobs with titles, salaries, remote policies, Japanese language requirements, visa sponsorship signals, and technology tags — plus company profiles with descriptions and tech stacks — across ~182 job listings and ~232 company pages.


TokyoDev Scraper Features

  • Scrapes job listings, company profiles, or both via a single scrapeMode selector
  • Extracts Japanese language requirement per listing — true/false, not buried in description text
  • Captures remote policy per job: fully-remote, partially-remote, or no-remote
  • Returns apply-from-abroad eligibility where disclosed — useful for candidates outside Japan
  • Collects technology and skill tags per listing (Ruby, Python, React, etc.)
  • Filters by remote policy, seniority level, or Japanese language requirement before saving
  • Accepts specific TokyoDev URLs directly — skip sitemap discovery for targeted runs
  • Uses residential proxy to bypass Cloudflare protection on all non-sitemap pages

Who Uses TokyoDev Data?

  • Recruiters — Pull structured Japan tech listings with remote and language filters already applied, not raw HTML to parse
  • Job aggregators — Ingest English-language Japan tech jobs with consistent field structure across listings
  • Market researchers — Analyze salary trends, remote policy distribution, and Japanese language demand across the Japan tech sector
  • HR analytics teams — Build datasets tracking which companies are hiring, what seniority levels are in demand, and what tech stacks are common
  • Candidate matching platforms — Filter by japanese_required and apply_from_abroad to surface realistic options for international applicants

How TokyoDev Scraper Works

  1. Fetches /sitemap.xml — accessible without Cloudflare challenge — and classifies URLs into job listings and company profile pages
  2. Applies mode filter (jobs, companies, or both) and optional filters for remote policy, seniority, and Japanese language requirement
  3. Loads each target page using a Playwright browser with residential proxy and anti-detection fingerprinting to bypass Cloudflare
  4. Extracts data from both JSON-LD structured markup and rendered HTML, with HTML as fallback for fields not in the schema

Input

{
  "scrapeMode": "jobs",
  "remotePolicy": "fully-remote",
  "japaneseRequired": "no-japanese-required",
  "maxItems": 50,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"]
  }
}
Field Type Default Description
scrapeMode string "both" What to scrape: "jobs", "companies", or "both"
searchUrls array Optional: specific TokyoDev URLs to scrape. Skips sitemap discovery.
remotePolicy string "" Filter by remote policy: "fully-remote", "partially-remote", "no-remote", or empty for all
seniority string "" Filter by seniority: "intern", "junior", "intermediate", "senior", or empty for all
japaneseRequired string "" Filter by Japanese language: "japanese-required", "no-japanese-required", or empty for all
maxItems integer 50 Maximum number of results to return
proxyConfiguration object RESIDENTIAL Proxy settings — residential proxy required for Cloudflare bypass

TokyoDev Scraper Output Fields

Job Listings

{
  "job_title": "Senior Rails Engineer",
  "company_name": "TableCheck",
  "company_url": "https://www.tablecheck.com",
  "location": "Tokyo",
  "job_type": "full-time",
  "seniority": "senior",
  "remote_policy": "partially-remote",
  "japanese_required": false,
  "apply_from_abroad": true,
  "salary_range": "8000000-14000000 JPY",
  "description": "TableCheck is looking for a senior Rails engineer...",
  "requirements": ["5+ years Rails experience", "Experience with PostgreSQL"],
  "tags": ["Ruby", "Rails", "PostgreSQL", "React"],
  "apply_url": "https://www.tablecheck.com/jobs/apply/rails-engineer",
  "posted_date": "2025-03-20",
  "job_url": "https://www.tokyodev.com/companies/tablecheck/jobs/senior-rails-engineer"
}
Field Type Description
job_title string Job title
company_name string Hiring company name
company_url string Company website URL
location string Job location (e.g. Tokyo, Remote, Osaka)
job_type string Employment type: full-time, contract, intern
seniority string Seniority level: junior, intermediate, senior
remote_policy string Remote work policy: fully-remote, partially-remote, no-remote
japanese_required boolean Whether Japanese language proficiency is required
apply_from_abroad boolean Whether candidates can apply from outside Japan
salary_range string Salary range if disclosed
description string Full job description text
requirements array Job requirements and qualifications
tags array Technology and skill tags (e.g. Ruby, Python, React)
apply_url string Direct URL to apply for the position
posted_date string Date the job was posted
job_url string Full TokyoDev job listing URL

Company Profiles

When scrapeMode is "companies" or "both", company records are included in the same dataset. Company records populate company_name, company_url, description, location, tags, and job_url (set to the company profile URL). Job-specific fields are null.

{
  "company_name": "Mercari",
  "company_url": "https://www.mercari.com",
  "location": "Tokyo",
  "description": "Mercari is Japan's largest marketplace app...",
  "tags": ["Go", "Kotlin", "Swift", "React", "Kubernetes"],
  "job_url": "https://www.tokyodev.com/companies/mercari"
}

🔍 FAQ

How do I scrape TokyoDev.com?

TokyoDev Scraper handles sitemap discovery automatically. Set scrapeMode to "jobs", "companies", or "both", apply any filters you need, configure the residential proxy, and run it. For targeted runs, paste specific TokyoDev URLs into searchUrls to skip the sitemap phase entirely.

Does TokyoDev Scraper need proxies?

It does. TokyoDev uses Cloudflare managed challenge on all page routes. The scraper uses a Playwright browser with residential proxy and anti-detection fingerprinting to get through. The sitemap at /sitemap.xml is accessible without challenge — the scraper uses that for URL discovery without consuming proxy budget.

What data can I get from TokyoDev.com?

TokyoDev Scraper returns job titles, companies, locations, employment types, seniority levels, remote policies, Japanese language requirements, apply-from-abroad flags, salary ranges, descriptions, requirements lists, technology tags, apply URLs, and posting dates. Company profiles include the company description, location, and tech stack tags.

Can I filter for jobs that don't require Japanese?

Set japaneseRequired to "no-japanese-required". TokyoDev Scraper applies the filter before saving records, so only matching results land in the dataset — you don't have to filter downstream.

How much does TokyoDev Scraper cost to run?

TokyoDev Scraper uses pay-per-event pricing. Because it requires a browser with residential proxy for each page, cost per record is higher than plain HTTP scrapers. Running the full board (~182 jobs + ~232 companies) costs roughly a few dollars depending on proxy consumption.


Need More Features?

Need scheduled runs, webhook delivery, or fields not currently extracted? File an issue or get in touch.

Why Use TokyoDev Scraper?

  • Structured language and remote datajapanese_required and remote_policy are extracted as typed fields, not buried in description text, so your filters work without NLP preprocessing
  • Dual-mode output — Jobs and company profiles in a single run with a shared schema, so you can join them by company_name without running two separate scrapers
  • CF-resilient by design — Residential proxy with browser fingerprinting handles Cloudflare without manual intervention; the sitemap bypass keeps URL discovery cheap