OrbTop

Mattermost Message Scraper

SOCIAL MEDIABUSINESS

Mattermost Message Scraper

Export messages from any Mattermost workspace — self-hosted or cloud. Provide your instance URL and a personal access token to scrape teams, channels, and posts. Supports filtering by team name, channel name, date range, and direct messages.

What it does

The actor connects to the Mattermost REST API v4 to:

  1. Fetch all teams accessible to the authenticated user
  2. For each team, fetch all accessible channels
  3. For each channel, paginate through posts and save them to the dataset

Each output record contains the full context of a message: which team and channel it came from, who posted it, when it was posted, the message content, reactions, and file attachment IDs.

Who is it for

  • Enterprise and government teams using self-hosted Mattermost for compliance archival and audit trails
  • Migration projects moving from Mattermost to another platform (Slack, Discord, Teams)
  • Knowledge base export — converting channel history to searchable documents or AI training data
  • Analytics and reporting on team communication patterns

Getting started

Step 1: Generate a personal access token

  1. Log in to your Mattermost instance
  2. Go to Account SettingsSecurityPersonal Access Tokens
  3. Click Create Token, give it a description, and copy the token

Note: Some Mattermost instances require an administrator to enable personal access tokens. Contact your Mattermost admin if the option is not visible in Account Settings.

Step 2: Run the actor

Set the following inputs:

Input Description
Instance URL Base URL of your Mattermost server (e.g. https://mattermost.yourcompany.com)
Personal Access Token The token generated in Step 1
Team Names Optional list of team slugs to scrape (leave empty for all teams)
Channel Names Optional list of channel names to scrape within the selected teams
Max Items Maximum number of posts to return
Since Date Only fetch posts on or after this date (ISO 8601)
Until Date Only fetch posts on or before this date (ISO 8601)
Include Direct Messages Set to true to also scrape D and G (group) channels

Example input

{
  "instanceUrl": "https://mattermost.yourcompany.com",
  "accessToken": "your-personal-access-token",
  "teamNames": ["engineering", "product"],
  "channelNames": ["general", "random"],
  "sinceDate": "2024-01-01",
  "maxItems": 1000
}

Output

Each dataset record represents a single Mattermost post:

Field Type Description
instance_url string Base URL of the Mattermost instance
team_id string Team UUID
team_name string Team URL slug
team_display_name string Team display name
channel_id string Channel UUID
channel_name string Channel URL slug
channel_display_name string Channel display name
channel_type string O=public, P=private, D=direct, G=group
channel_header string Channel header text
channel_purpose string Channel purpose text
post_id string Post UUID
post_root_id string Parent post UUID (empty for root posts, set for thread replies)
post_create_at integer Creation timestamp in milliseconds since Unix epoch
post_update_at integer Last update timestamp in ms
post_edit_at integer Last edit timestamp in ms (0 if never edited)
post_user_id string Author's user UUID
post_user_username string Author's username
post_message string Post body text (Markdown)
post_type string Empty for regular posts; system_* for system messages
post_hashtags string Space-separated hashtags
post_reactions string JSON array of emoji reactions [{emoji_name, user_id, create_at}]
post_file_ids string JSON array of file attachment IDs
post_props string JSON object with post metadata (attachments, overrides, etc.)

Example record

{
  "instance_url": "https://community.mattermost.com",
  "team_id": "rcgiyftm7jyrxnma1osd8zswby",
  "team_name": "core",
  "team_display_name": "Contributors",
  "channel_id": "zw43c5ttrjyu9dg7jnudwuz6bw",
  "channel_name": "town-square",
  "channel_display_name": "Town Square",
  "channel_type": "O",
  "channel_header": "Welcome to Mattermost",
  "channel_purpose": "The default channel for the team",
  "post_id": "3yd1q7gcrinbdmdq5qmnggd4xy",
  "post_root_id": "",
  "post_create_at": 1748063442000,
  "post_update_at": 1748063442000,
  "post_edit_at": 0,
  "post_user_id": "nkb43bj3h3ga8p4m3n5rwhmtia",
  "post_user_username": "john.doe",
  "post_message": "Hello everyone!",
  "post_type": "",
  "post_hashtags": "",
  "post_reactions": "[]",
  "post_file_ids": "[]",
  "post_props": "{}"
}

Filtering tips

  • Use Team Names and Channel Names filters to narrow the scope and speed up the run. Both accept a partial list — channels in non-matching teams are automatically excluded.
  • Use Since Date / Until Date to limit to a specific time window (e.g. last quarter for compliance reports).
  • Set Include Direct Messages to true only when you need DM history. Note: your token must have access to the DM channels.

Rate limits and performance

Mattermost's default API rate limit is generous for authenticated users (typically 200+ requests per second on self-hosted). The actor pages channels at 200 posts per request and resolves usernames with a local cache to minimize API calls.

For very large workspaces (millions of posts), set Max Items to limit the run, then resume by setting Since Date to the date of the last run.

Self-hosted vs. cloud instances

The actor works identically with both self-hosted Mattermost and Mattermost cloud — just set instanceUrl to your instance's base URL.

For the public Mattermost community server, use https://community.mattermost.com.