Use Cases and Examples
This guide provides real-world examples and complete workflows for common DataFlood Suite use cases.
E-Commerce Platform Testing
Scenario
Generate realistic test data for an e-commerce platform including products, customers, orders, and transactions.
Step 1: Create Product Schema
Using DataFloodEditor:
- Open Model Editor
- Create schema structure:
{
"type": "object",
"properties": {
"productId": {
"type": "string",
"pattern": "^PROD-[0-9]{6}$"
},
"name": {
"type": "string",
"stringModel": {
"patterns": ["Llll Llll", "Llll"],
"valueFrequency": {
"Wireless Mouse": 10,
"Gaming Keyboard": 8,
"USB Hub": 6,
"Monitor Stand": 5
}
}
},
"category": {
"type": "string",
"enum": ["Electronics", "Accessories", "Computers", "Gaming"]
},
"price": {
"type": "number",
"minimum": 9.99,
"maximum": 999.99,
"histogram": {
"bins": [
{"rangeStart": 9.99, "rangeEnd": 49.99, "frequency": 0.4},
{"rangeStart": 50, "rangeEnd": 199.99, "frequency": 0.35},
{"rangeStart": 200, "rangeEnd": 499.99, "frequency": 0.2},
{"rangeStart": 500, "rangeEnd": 999.99, "frequency": 0.05}
]
}
},
"inStock": {
"type": "boolean"
},
"stockQuantity": {
"type": "integer",
"minimum": 0,
"maximum": 500
}
},
"required": ["productId", "name", "category", "price", "inStock"]
}
Step 2: Create Customer Schema
{
"type": "object",
"properties": {
"customerId": {
"type": "string",
"pattern": "^CUST-[0-9]{8}$"
},
"email": {
"type": "string",
"format": "email",
"stringModel": {
"patterns": ["llll.llll@llll.com"],
"entropyScore": 3.5
}
},
"firstName": {
"type": "string",
"stringModel": {
"patterns": ["Llll"],
"valueFrequency": {
"John": 15, "Jane": 12, "Michael": 10,
"Sarah": 11, "Robert": 9, "Emily": 10
}
}
},
"lastName": {
"type": "string",
"stringModel": {
"patterns": ["Llll", "Ll'Llll"],
"valueFrequency": {
"Smith": 20, "Johnson": 15, "Williams": 12,
"Brown": 10, "Jones": 8, "Garcia": 7
}
}
},
"registrationDate": {
"type": "string",
"format": "date"
},
"tier": {
"type": "string",
"enum": ["Bronze", "Silver", "Gold", "Platinum"],
"enumProbabilities": [0.5, 0.3, 0.15, 0.05]
}
}
}
Step 3: Create Order Schema with Relationships
{
"type": "object",
"properties": {
"orderId": {
"type": "string",
"pattern": "^ORD-[0-9]{10}$"
},
"customerId": {
"type": "string",
"pattern": "^CUST-[0-9]{8}$"
},
"orderDate": {
"type": "string",
"format": "date-time"
},
"items": {
"type": "array",
"minItems": 1,
"maxItems": 10,
"items": {
"type": "object",
"properties": {
"productId": {
"type": "string",
"pattern": "^PROD-[0-9]{6}$"
},
"quantity": {
"type": "integer",
"minimum": 1,
"maximum": 5
},
"unitPrice": {
"type": "number",
"minimum": 9.99,
"maximum": 999.99
}
}
}
},
"totalAmount": {
"type": "number",
"minimum": 9.99,
"maximum": 9999.99
},
"status": {
"type": "string",
"enum": ["pending", "processing", "shipped", "delivered", "cancelled"],
"enumProbabilities": [0.1, 0.2, 0.3, 0.35, 0.05]
}
}
}
Step 4: Design Time-Based Sequence
Using Tides Editor:
{
"name": "E-commerce Daily Flow",
"startTime": "2024-01-01T00:00:00Z",
"endTime": "2024-01-01T23:59:59Z",
"intervalMs": 60000,
"steps": [
{
"stepId": "products",
"modelPath": "./products.json",
"startOffset": 0,
"documentsPerInterval": 2,
"generationProbability": 0.1
},
{
"stepId": "customers",
"modelPath": "./customers.json",
"startOffset": 0,
"documentsPerInterval": 5,
"generationProbability": 0.3
},
{
"stepId": "orders",
"modelPath": "./orders.json",
"startOffset": 3600000,
"documentsPerInterval": 10,
"generationProbability": 0.8
}
],
"transactions": [
{
"transactionId": "customer-orders",
"parentStepId": "customers",
"childSteps": [
{
"stepId": "orders",
"minCount": 0,
"maxCount": 5,
"additionalDelayMs": 300000
}
],
"triggerProbability": 0.7,
"linkingStrategy": "InjectParentId",
"parentIdField": "customerId"
}
]
}
Step 5: Generate Test Data
Using CLI:
## Generate initial catalog
DataFlood generate products.json --count 1000 --output products.json
## Generate customers
DataFlood generate customers.json --count 500 --output customers.json
## Execute sequence for orders
DataFlood sequence ecommerce-sequence.json --output orders.json
Using API:
## Start FloodGate API server
FloodGate api
## Generate via API
curl -X POST "http://localhost:5000/api/serving/sequence/ecommerce-sequence" \
-H "Content-Type: application/json" \
-d '{"maxDocuments": 10000}'
IoT Sensor Network Simulation
Scenario
Simulate a network of IoT sensors generating temperature, humidity, and motion data with realistic patterns and anomalies.
Sensor Schema
{
"type": "object",
"properties": {
"sensorId": {
"type": "string",
"pattern": "^SENSOR-[A-Z]{2}-[0-9]{4}$"
},
"timestamp": {
"type": "string",
"format": "date-time"
},
"location": {
"type": "object",
"properties": {
"building": {
"type": "string",
"enum": ["A", "B", "C", "D"]
},
"floor": {
"type": "integer",
"minimum": 1,
"maximum": 10
},
"room": {
"type": "string",
"pattern": "^[0-9]{3}$"
}
}
},
"temperature": {
"type": "number",
"minimum": 15.0,
"maximum": 30.0,
"histogram": {
"bins": [
{"rangeStart": 15, "rangeEnd": 18, "frequency": 0.1},
{"rangeStart": 18, "rangeEnd": 22, "frequency": 0.6},
{"rangeStart": 22, "rangeEnd": 25, "frequency": 0.25},
{"rangeStart": 25, "rangeEnd": 30, "frequency": 0.05}
]
}
},
"humidity": {
"type": "number",
"minimum": 30,
"maximum": 70
},
"motion": {
"type": "boolean"
},
"batteryLevel": {
"type": "number",
"minimum": 0,
"maximum": 100
}
}
}
Anomaly Detection Test Data
Create a sequence that includes normal and anomalous readings:
{
"name": "Sensor Network with Anomalies",
"intervalMs": 5000,
"steps": [
{
"stepId": "normal-readings",
"modelPath": "./sensor-normal.json",
"documentsPerInterval": 20,
"generationProbability": 0.95,
"weight": 9.0
},
{
"stepId": "anomalies",
"modelPath": "./sensor-anomaly.json",
"documentsPerInterval": 1,
"generationProbability": 0.05,
"weight": 1.0,
"customProperties": {
"anomalyType": "temperature_spike",
"severity": "high"
}
}
]
}
Generate Streaming Data
## Generate continuous stream
DataFlood sequence sensor-sequence.json \
--max-docs 100000 \
--format jsonl \
--output sensor-stream.jsonl
Banking Transaction System
Scenario
Generate banking transactions with realistic patterns, including accounts, transactions, and fraud detection test cases.
Account Schema
{
"type": "object",
"properties": {
"accountNumber": {
"type": "string",
"pattern": "^[0-9]{10}$"
},
"accountType": {
"type": "string",
"enum": ["checking", "savings", "credit"]
},
"balance": {
"type": "number",
"minimum": 0,
"maximum": 1000000,
"histogram": {
"bins": [
{"rangeStart": 0, "rangeEnd": 1000, "frequency": 0.3},
{"rangeStart": 1000, "rangeEnd": 10000, "frequency": 0.4},
{"rangeStart": 10000, "rangeEnd": 100000, "frequency": 0.25},
{"rangeStart": 100000, "rangeEnd": 1000000, "frequency": 0.05}
]
}
},
"customerId": {
"type": "string",
"pattern": "^[0-9]{8}$"
},
"openDate": {
"type": "string",
"format": "date"
}
}
}
Transaction Schema with Patterns
{
"type": "object",
"properties": {
"transactionId": {
"type": "string",
"format": "uuid"
},
"accountNumber": {
"type": "string",
"pattern": "^[0-9]{10}$"
},
"transactionType": {
"type": "string",
"enum": ["deposit", "withdrawal", "transfer", "payment", "fee"],
"enumProbabilities": [0.2, 0.3, 0.25, 0.2, 0.05]
},
"amount": {
"type": "number",
"histogram": {
"bins": [
{"rangeStart": 0.01, "rangeEnd": 100, "frequency": 0.5},
{"rangeStart": 100, "rangeEnd": 500, "frequency": 0.3},
{"rangeStart": 500, "rangeEnd": 2000, "frequency": 0.15},
{"rangeStart": 2000, "rangeEnd": 10000, "frequency": 0.05}
]
}
},
"merchantCategory": {
"type": "string",
"enum": ["grocery", "gas", "restaurant", "retail", "online", "atm", "other"],
"enumProbabilities": [0.25, 0.15, 0.15, 0.2, 0.15, 0.05, 0.05]
},
"timestamp": {
"type": "string",
"format": "date-time"
},
"location": {
"type": "object",
"properties": {
"city": {
"type": "string",
"stringModel": {
"valueFrequency": {
"New York": 20,
"Los Angeles": 15,
"Chicago": 10,
"Houston": 8
}
}
},
"country": {
"type": "string",
"enum": ["USA", "Canada", "Mexico"]
}
}
}
}
}
Fraud Test Cases
Create specific patterns for fraud testing:
## Normal transactions
DataFlood generate transaction-normal.json \
--count 10000 \
--seed 42 \
--output normal-transactions.json
## Suspicious patterns (rapid transactions)
DataFlood sequence fraud-rapid-sequence.json \
--output fraud-rapid.json
## Unusual amounts
DataFlood generate transaction-unusual.json \
--entropy 5.0 \
--count 100 \
--output fraud-amounts.json
Healthcare Records
Scenario
Generate synthetic patient records for healthcare system testing while maintaining privacy.
Patient Schema
{
"type": "object",
"properties": {
"patientId": {
"type": "string",
"pattern": "^PAT-[0-9]{10}$"
},
"demographics": {
"type": "object",
"properties": {
"age": {
"type": "integer",
"minimum": 0,
"maximum": 120,
"histogram": {
"bins": [
{"rangeStart": 0, "rangeEnd": 18, "frequency": 0.2},
{"rangeStart": 18, "rangeEnd": 65, "frequency": 0.6},
{"rangeStart": 65, "rangeEnd": 120, "frequency": 0.2}
]
}
},
"gender": {
"type": "string",
"enum": ["M", "F", "O"],
"enumProbabilities": [0.49, 0.49, 0.02]
}
}
},
"conditions": {
"type": "array",
"items": {
"type": "string",
"enum": ["diabetes", "hypertension", "asthma", "arthritis", "none"],
"enumProbabilities": [0.1, 0.15, 0.08, 0.07, 0.6]
},
"minItems": 0,
"maxItems": 3
},
"lastVisit": {
"type": "string",
"format": "date"
}
}
}
Appointment Generation
{
"name": "Healthcare Appointments",
"intervalMs": 900000, // 15 minutes
"steps": [
{
"stepId": "routine-appointments",
"modelPath": "./appointment-routine.json",
"documentsPerInterval": 4,
"generationProbability": 0.8
},
{
"stepId": "emergency-visits",
"modelPath": "./appointment-emergency.json",
"documentsPerInterval": 1,
"generationProbability": 0.1
}
]
}
Log File Generation
Scenario
Generate application logs for testing log analysis and monitoring systems.
Application Log Schema
{
"type": "object",
"properties": {
"timestamp": {
"type": "string",
"format": "date-time"
},
"level": {
"type": "string",
"enum": ["DEBUG", "INFO", "WARN", "ERROR", "FATAL"],
"enumProbabilities": [0.3, 0.5, 0.15, 0.04, 0.01]
},
"service": {
"type": "string",
"enum": ["api", "database", "cache", "queue", "worker"],
"enumProbabilities": [0.4, 0.2, 0.15, 0.15, 0.1]
},
"message": {
"type": "string",
"stringModel": {
"patterns": [
"Request processed successfully",
"Database connection established",
"Cache miss for key: XXXX",
"Queue message processed",
"Error: Connection timeout"
],
"entropyScore": 2.5
}
},
"requestId": {
"type": "string",
"format": "uuid"
},
"duration": {
"type": "integer",
"minimum": 1,
"maximum": 5000,
"histogram": {
"bins": [
{"rangeStart": 1, "rangeEnd": 100, "frequency": 0.7},
{"rangeStart": 100, "rangeEnd": 1000, "frequency": 0.25},
{"rangeStart": 1000, "rangeEnd": 5000, "frequency": 0.05}
]
}
}
}
}
Generate Log Stream
## Generate logs in JSONL format for streaming
DataFlood generate app-log.json \
--count 100000 \
--format jsonl \
--output app-logs.jsonl
## Or use sequence for time-based patterns
DataFlood sequence log-sequence.json \
--format jsonl \
--max-docs 1000000 \
--output logs-stream.jsonl
API Testing Data
Scenario
Generate test data for API endpoints including various HTTP methods and response codes.
API Request Schema
{
"type": "object",
"properties": {
"method": {
"type": "string",
"enum": ["GET", "POST", "PUT", "DELETE", "PATCH"],
"enumProbabilities": [0.5, 0.25, 0.15, 0.05, 0.05]
},
"endpoint": {
"type": "string",
"stringModel": {
"patterns": [
"/api/users/dddd",
"/api/products/dddd",
"/api/orders/dddd",
"/api/search"
]
}
},
"headers": {
"type": "object",
"properties": {
"authorization": {
"type": "string",
"pattern": "^Bearer [A-Za-z0-9]{32}$"
},
"contentType": {
"type": "string",
"enum": ["application/json", "application/xml", "text/plain"]
}
}
},
"body": {
"type": "object",
"additionalProperties": true
},
"responseCode": {
"type": "integer",
"enum": [200, 201, 204, 400, 401, 403, 404, 500],
"enumProbabilities": [0.6, 0.1, 0.05, 0.08, 0.05, 0.02, 0.05, 0.05]
}
}
}
Best Practices for Use Cases
1. Start with Real Data
- Collect sample data from production (anonymized)
- Use DataFloodEditor import to analyze patterns
- Refine statistical models based on analysis
2. Incremental Development
- Start with simple schemas
- Add complexity gradually
- Test each iteration
- Validate against requirements
3. Realistic Distributions
- Use histograms for numeric values
- Set appropriate enum probabilities
- Configure string patterns from samples
- Test edge cases
4. Sequence Design
- Model real-world timing patterns
- Include peak and off-peak periods
- Add appropriate delays between related events
- Test with short time ranges first
5. Performance Testing
## Small batch for validation
DataFlood generate schema.json --count 10
## Medium batch for testing
DataFlood generate schema.json --count 1000
## Large batch for performance
DataFlood generate schema.json --count 100000 --format csv
6. Data Validation
- Always validate generated data
- Check constraint satisfaction
- Verify relationships
- Test with downstream systems
Integration Examples
Python Integration
import requests
import pandas as pd
## Generate data via API
response = requests.post(
"http://localhost:5000/api/documentgenerator/generate-simple",
json={"schema": schema, "count": 1000, "format": "csv"}
)
## Load into pandas
df = pd.read_csv(io.StringIO(response.text))
Database Loading
## Generate CSV
DataFlood generate schema.json --format csv --count 10000 --output data.csv
## Load into PostgreSQL
psql -c "COPY table_name FROM 'data.csv' CSV HEADER"
## Load into MySQL
mysql -e "LOAD DATA INFILE 'data.csv' INTO TABLE table_name"
Streaming Pipeline
## Generate streaming data
DataFlood sequence stream-sequence.json --format jsonl --output - | \
kafka-console-producer --topic test-data --broker-list localhost:9092
Conclusion
These use cases demonstrate the flexibility of DataFlood Suite for generating realistic test data across various domains. Key takeaways:
- Model real patterns: Geenerate DataFlood models from actual data
- Design relationships: Create realistic dependencies between entities
- Time-based generation: Model temporal patterns accurately
- Test thoroughly: Validate data meets requirements
- Scale appropriately: Start small, scale up gradually
For more examples and detailed configurations, refer to the component-specific documentation.