OrbTop

Matrix Message Scraper

SOCIAL MEDIAAUTOMATION

Matrix Message Scraper

Scrape public rooms and message history from any Matrix homeserver — matrix.org, Element, or self-hosted. Returns room metadata and message events in clean JSON.

Matrix is a federated open protocol with 80 million accounts across 100,000+ homeservers. It's the backend for Element, the chat platform used by Mozilla, KDE, the French government, and a long list of open-source projects. Most public rooms are unencrypted and fully readable. This scraper gets you into that data.

Matrix Message Scraper Features

  • Discovers public rooms across any Matrix homeserver without an account — name, topic, member count, join rules, aliases, and avatar URL
  • Filters by keyword — narrow public room results to a specific topic before collecting
  • Scrapes message history from one or more rooms using your Matrix access token — works on any public or joined room
  • Extracts full event metadata — sender ID, display name, timestamp, message type, reply threading, edits, and media URLs
  • Handles cursor pagination — fetches complete message history across thousands of events without manual cursor management
  • Federation-aware — query a different homeserver's public room list by pointing the homeserver URL at any Matrix server
  • No proxies required. Matrix REST API is accessible from standard IPs.
  • Two distinct modes: room discovery (no auth) and message scraping (access token required)

What Can You Do With Matrix Data?

  • Open-source community researchers — map the Matrix ecosystem, identify active projects, track migration from Slack/Discord
  • Academic linguists — large-scale multilingual conversation corpora without Twitter's data access restrictions
  • Compliance teams — archive message history for organizations that adopted Element as their primary chat platform
  • Privacy advocates — analyze the federated network structure and federation patterns across homeservers
  • Content moderation tooling — build training datasets from public room conversations
  • Developer tools — monitor Matrix rooms for mentions, keywords, or bot triggers

How It Works

  1. Pick a mode. Discover Public Rooms lists rooms on the homeserver by keyword or returns all rooms. Scrape Room Messages fetches message history for specific room IDs.
  2. Configure the homeserver. Defaults to matrix.org. Point it at any Matrix homeserver URL for federated queries.
  3. Provide a token for messages. The Discover mode needs no credentials. For messages, get your access token from Element: Settings → Help & About → Access Token.
  4. Run. The scraper handles cursor pagination — the Matrix API returns results in batches with next-page tokens, and this actor follows them until it hits your item limit.

Matrix Message Scraper Input

{
  "action": "Discover Public Rooms",
  "homeserver": "https://matrix.org",
  "searchTerm": "linux",
  "maxItems": 100
}
{
  "action": "Scrape Room Messages",
  "homeserver": "https://matrix.org",
  "accessToken": "syt_...",
  "roomIds": ["!abcdef:matrix.org", "!ghijkl:kde.org"],
  "maxMessagesPerRoom": 500,
  "maxItems": 1000,
  "messageDirection": "b"
}
Field Type Default Description
action string Discover Public Rooms What to scrape. Options: Discover Public Rooms, Scrape Room Messages
homeserver string https://matrix.org Matrix homeserver base URL
accessToken string Matrix access token. Required for Scrape Room Messages.
roomIds array List of Matrix room IDs (e.g. !abcdef:matrix.org). Used with Scrape Room Messages.
searchTerm string Keyword filter for Discover Public Rooms. Leave blank to return all rooms.
maxItems integer 10 Maximum number of records to return across all rooms
maxMessagesPerRoom integer 100 Maximum messages per room. Used with Scrape Room Messages.
messageDirection string b Pagination direction. b = backward (older first), f = forward (newest first)

Matrix Message Scraper Output

Mode 1: Discover Public Rooms

{
  "record_type": "room",
  "room_id": "!L58ME6ufiP49v97UIOBIpvWKEgj4912JmECPuDzlvCI",
  "room_name": "Matrix HQ",
  "room_topic": "The Official Matrix HQ — chat about Matrix here! | https://matrix.org",
  "room_canonical_alias": "#matrix:matrix.org",
  "room_member_count": 5451,
  "room_is_encrypted": false,
  "room_join_rule": "public",
  "room_world_readable": true,
  "room_guest_can_join": true,
  "room_avatar_url": "mxc://matrix.org/DRevoaEiuzbkOznknySKuMmE",
  "room_type": null,
  "homeserver": "https://matrix.org"
}
Field Type Description
record_type string room for this mode
room_id string Unique Matrix room ID (e.g. !abcdef:matrix.org)
room_name string Display name of the room
room_topic string Room topic or description
room_canonical_alias string Canonical room alias (e.g. #matrix:matrix.org)
room_member_count integer Number of joined members
room_is_encrypted boolean Whether end-to-end encryption is enabled
room_join_rule string Join rule: public, invite, knock, or restricted
room_world_readable boolean Whether history is visible without joining
room_guest_can_join boolean Whether guests can join
room_avatar_url string mxc:// URL for the room avatar
room_type string Room type (m.space for Matrix Spaces, null for regular rooms)
homeserver string Homeserver URL used for the query

Mode 2: Scrape Room Messages

{
  "record_type": "message",
  "event_id": "$example_event_id:matrix.org",
  "event_type": "m.room.message",
  "event_sender": "@alice:matrix.org",
  "event_sender_display_name": "Alice",
  "event_origin_server_ts": 1715000000000,
  "event_content_msgtype": "m.text",
  "event_content_body": "Hey everyone, anyone know the status of the new spec?",
  "event_content_formatted_body": null,
  "event_content_url": null,
  "event_reply_to": null,
  "event_thread_id": null,
  "event_edits": null,
  "event_room_id": "!L58ME6ufiP49v97UIOBIpvWKEgj4912JmECPuDzlvCI"
}
Field Type Description
record_type string message for this mode
event_id string Unique Matrix event ID (e.g. $eventid:matrix.org)
event_type string Event type (only m.room.message events are saved)
event_sender string Matrix user ID of the sender (e.g. @user:matrix.org)
event_sender_display_name string Display name at time of event (may be null)
event_origin_server_ts integer Event timestamp in milliseconds since Unix epoch
event_content_msgtype string Message subtype: m.text, m.image, m.file, m.video, m.audio, m.notice
event_content_body string Plain-text body or filename for media messages
event_content_formatted_body string HTML-formatted body (when present)
event_content_url string mxc:// media URL for image/file/video/audio messages
event_reply_to string Event ID being replied to (m.in_reply_to), if any
event_thread_id string Thread root event ID (m.thread relation), if any
event_edits string Event ID being edited by this event (m.replace relation), if any
event_room_id string Room ID the event belongs to

🔍 FAQ

How do I scrape Matrix rooms? Matrix Message Scraper connects to the Matrix Client-Server API at your chosen homeserver. For public room discovery, no credentials are needed — just set action to Discover Public Rooms and run. For message history, get your access token from Element (Settings → Help & About → Access Token) and supply a list of room IDs.

How much does Matrix Message Scraper cost to run? Matrix Message Scraper charges $0.10 per run start plus $0.001 per record. Scraping 1,000 messages from a set of rooms costs roughly $1.10. Public room discovery is inexpensive — 100 rooms is about $0.20 total.

What data can I get from matrix.org without logging in? The public room directory. Matrix Message Scraper returns room ID, name, topic, member count, canonical alias, join rule, and avatar URL for all publicly listed rooms — no access token needed. Message history requires authentication.

Can Matrix Message Scraper scrape rooms on homeservers other than matrix.org? Yes. Set homeserver to any Matrix server URL (e.g. https://matrix.mozilla.org, https://matrix.kde.org, or a self-hosted instance). Public room discovery and authenticated message scraping work on any standard Matrix homeserver.

Does it scrape encrypted rooms? No. Encrypted room messages are stored as opaque ciphertext — without the private keys they're unreadable. Matrix Message Scraper skips encrypted event payloads. Most public rooms don't use encryption.

Need More Features?

Need custom filters, incremental scraping across runs, or support for a different homeserver configuration? File a request or get in touch.

Why Use Matrix Message Scraper?

  • First Matrix scraper on Apify — zero competing actors means no stale or abandoned alternatives
  • No proxies, no browser overhead — pure REST API access, which keeps costs low and runs fast
  • Federation-aware — one actor queries any homeserver, not just matrix.org