OrbTop

Synthetic Financial Data Generator

DEVELOPER TOOLSBUSINESSAUTOMATION

Synthetic Financial Data Generator

Generate realistic synthetic financial transaction data for ML training, fintech testing, and data pipeline development. Produces bank-statement-quality records with category-aware amounts, temporal spending patterns, running balances, and configurable fraud labels.

What it does

This actor generates synthetic financial transactions that mimic real banking data. No web scraping is involved -- all data is computed locally using statistical models.

Each transaction includes:

  • Account details -- holder name, account type (checking, savings, credit, investment), account ID
  • Transaction data -- amount, date, category, merchant name, MCC code, description
  • Running balance -- accurate per-account balance tracking across all transactions
  • Fraud labels (optional) -- binary fraud flag, fraud type classification, anomaly score

Categories and amount distributions

Transactions are distributed across 12 spending categories with realistic amount ranges:

Category Range Distribution
Groceries $15 -- $250 Log-normal (mean $65)
Rent $800 -- $3,500 Normal (mean $1,500)
Salary $2,000 -- $8,000 Normal (mean $4,500)
Dining $8 -- $120 Log-normal (mean $35)
Coffee $3 -- $9 Normal (mean $5.50)
Shopping $10 -- $500 Log-normal (mean $75)
Transport $2 -- $100 Log-normal (mean $25)
Utilities $40 -- $350 Normal (mean $150)
Entertainment $5 -- $80 Log-normal (mean $25)
Healthcare $15 -- $600 Log-normal (mean $120)
Subscriptions $5 -- $50 Normal (mean $15)
Transfers $50 -- $2,000 Log-normal (mean $500)

Temporal patterns

  • Weekday/weekend bias -- coffee and transport spike on weekdays; dining and entertainment spike on weekends
  • Recurring transactions -- salary deposits (1st and 15th), rent (1st), utilities (15th), subscriptions (variable day)
  • Seasonal multipliers -- spending increases in November (1.15x) and December (1.30x), dips in January (0.85x)
  • Time-of-day realism -- coffee purchases at 6-11 AM, dining at 11 AM-10 PM, salary at 8 AM

Fraud injection

When enabled, a configurable percentage of transactions are flagged as fraudulent with:

  • Fraud types: card_stolen, account_takeover, card_not_present, synthetic_identity
  • Anomaly pattern: fraudulent amounts are 2-8x the normal category maximum
  • Fraud score: 0.7-1.0 for fraudulent transactions, 0.0-0.3 for legitimate ones

Input

Field Type Default Description
maxItems integer 100 Number of transactions to generate
numAccounts integer 5 Number of unique financial accounts
currency string USD Currency code (USD, EUR, GBP, JPY, CAD, AUD)
dateRangeMonths integer 6 Months of history to generate
fraudRate number 2 Percentage of fraudulent transactions (0-100)
includeFraudLabels boolean true Include fraud detection fields in output
seed integer 0 Random seed for reproducible output

Output

Each transaction record contains:

{
  "transaction_id": "397b9202-8ace-4fc4-9fa2-464893c3bc34",
  "account_id": "ACCT-0001",
  "account_holder": "Brenda Upton",
  "account_type": "checking",
  "currency": "USD",
  "date": "2025-10-03T09:25:27.000Z",
  "amount": -65.42,
  "type": "debit",
  "category": "groceries",
  "merchant_name": "Whole Foods",
  "merchant_category_code": "5411",
  "balance_after": 4231.58,
  "is_recurring": false,
  "description": "Whole Foods - groceries purchase",
  "is_fraudulent": false,
  "fraud_type": null,
  "fraud_score": 0.12
}

When includeFraudLabels is false, the is_fraudulent, fraud_type, and fraud_score fields are omitted.

Use cases

  • ML model training -- fraud detection, transaction categorization, anomaly detection
  • Fintech testing -- payment processing pipelines, accounting software, budgeting apps
  • Data pipeline development -- ETL workflows, data warehouse testing, API mocking
  • Demo data -- realistic financial dashboards and reports

Reproducibility

Set the seed parameter to any positive integer to get identical output across runs. This is useful for:

  • Consistent test fixtures
  • Reproducible ML training datasets
  • Deterministic integration tests

Performance

  • Sub-second generation for 1,000 transactions
  • 256 MB memory sufficient for up to 50,000 transactions
  • No network requests -- pure computation