Synthetic Financial Data Generator
Synthetic Financial Data Generator
Generate realistic synthetic financial data for ML model training, fintech QA pipelines, and data platform development. Produces bank-statement-quality synthetic financial transactions with category-aware amounts, temporal spending patterns, running balances, and configurable fraud labels — no real user data required. Synthetic financial data from this generator is safe for sharing across teams, committing to repositories, and embedding in demos.
What it does
This actor generates synthetic financial transactions that mimic real banking data. No web scraping is involved -- all data is computed locally using statistical models.
Each transaction includes:
- Account details -- holder name, account type (checking, savings, credit, investment), account ID
- Transaction data -- amount, date, category, merchant name, MCC code, description
- Running balance -- accurate per-account balance tracking across all transactions
- Fraud labels (optional) -- binary fraud flag, fraud type classification, anomaly score
Categories and amount distributions
Transactions are distributed across 12 spending categories with realistic amount ranges:
| Category | Range | Distribution |
|---|---|---|
| Groceries | $15 -- $250 | Log-normal (mean $65) |
| Rent | $800 -- $3,500 | Normal (mean $1,500) |
| Salary | $2,000 -- $8,000 | Normal (mean $4,500) |
| Dining | $8 -- $120 | Log-normal (mean $35) |
| Coffee | $3 -- $9 | Normal (mean $5.50) |
| Shopping | $10 -- $500 | Log-normal (mean $75) |
| Transport | $2 -- $100 | Log-normal (mean $25) |
| Utilities | $40 -- $350 | Normal (mean $150) |
| Entertainment | $5 -- $80 | Log-normal (mean $25) |
| Healthcare | $15 -- $600 | Log-normal (mean $120) |
| Subscriptions | $5 -- $50 | Normal (mean $15) |
| Transfers | $50 -- $2,000 | Log-normal (mean $500) |
Temporal patterns
- Weekday/weekend bias -- coffee and transport spike on weekdays; dining and entertainment spike on weekends
- Recurring transactions -- salary deposits (1st and 15th), rent (1st), utilities (15th), subscriptions (variable day)
- Seasonal multipliers -- spending increases in November (1.15x) and December (1.30x), dips in January (0.85x)
- Time-of-day realism -- coffee purchases at 6-11 AM, dining at 11 AM-10 PM, salary at 8 AM
Fraud injection
When enabled, a configurable percentage of transactions are flagged as fraudulent with:
- Fraud types: card_stolen, account_takeover, card_not_present, synthetic_identity
- Anomaly pattern: fraudulent amounts are 2-8x the normal category maximum
- Fraud score: 0.7-1.0 for fraudulent transactions, 0.0-0.3 for legitimate ones
Input
| Field | Type | Default | Description |
|---|---|---|---|
maxItems |
integer | 100 | Number of transactions to generate |
numAccounts |
integer | 5 | Number of unique financial accounts |
currency |
string | USD | Currency code (USD, EUR, GBP, JPY, CAD, AUD) |
dateRangeMonths |
integer | 6 | Months of history to generate |
fraudRate |
number | 2 | Percentage of fraudulent transactions (0-100) |
includeFraudLabels |
boolean | true | Include fraud detection fields in output |
seed |
integer | 0 | Random seed for reproducible output |
Output
Each transaction record contains:
{
"transaction_id": "397b9202-8ace-4fc4-9fa2-464893c3bc34",
"account_id": "ACCT-0001",
"account_holder": "Brenda Upton",
"account_type": "checking",
"currency": "USD",
"date": "2025-10-03T09:25:27.000Z",
"amount": -65.42,
"type": "debit",
"category": "groceries",
"merchant_name": "Whole Foods",
"merchant_category_code": "5411",
"balance_after": 4231.58,
"is_recurring": false,
"description": "Whole Foods - groceries purchase",
"is_fraudulent": false,
"fraud_type": null,
"fraud_score": 0.12
}
When includeFraudLabels is false, the is_fraudulent, fraud_type, and fraud_score fields are omitted.
Use cases
Synthetic financial data is the safest way to build and test financial software without exposing real customer records:
- ML model training -- fraud detection, transaction categorization, anomaly detection, and credit scoring models all need labeled synthetic financial data to train without privacy risk
- Fintech QA -- payment processing pipelines, accounting software, and budgeting apps need realistic transactions for integration tests
- Data pipeline development -- ETL workflows, data warehouse testing, and API mocking all benefit from a reproducible synthetic financial data fixture
- Fraud model training -- configurable fraud rate and four fraud-type labels make this a purpose-built source of labeled synthetic financial fraud data
- Demo data -- realistic financial dashboards and investor reports that can be shared publicly
FAQ
Is synthetic financial data safe to use in production environments?
Yes. Because synthetic financial data is statistically generated — not derived from real accounts — it carries no PII risk, no regulatory exposure, and no data-sharing restrictions. It can be committed to repos, passed to third-party vendors, and embedded in product demos.
How realistic is the synthetic financial data?
Each category (groceries, rent, salary, dining, etc.) is sampled from a calibrated statistical distribution with realistic mean and variance. Temporal patterns mirror real banking data: salary deposits on the 1st and 15th, weekend dining spikes, seasonal November/December uplift. The output passes basic financial-data sanity checks used in model evaluation.
Can I use this alongside other synthetic data generators?
Yes. If you need synthetic financial data combined with synthetic customer profiles or synthetic e-commerce orders, pair this actor with the Synthetic Dataset Generator or the Synthetic E-commerce Data Generator.
Reproducibility
Set the seed parameter to any positive integer to get identical output across runs. This is useful for:
- Consistent test fixtures
- Reproducible ML training datasets
- Deterministic integration tests
Performance
- Sub-second generation for 1,000 transactions
- 256 MB memory sufficient for up to 50,000 transactions
- No network requests -- pure computation