Use Cases and Examples

This guide provides real-world examples and complete workflows for common DataFlood Suite use cases.

E-Commerce Platform Testing

Scenario

Generate realistic test data for an e-commerce platform including products, customers, orders, and transactions.

Step 1: Create Product Schema

Using DataFloodEditor:

Open Model Editor
Create schema structure:

{
  "type": "object",
  "properties": {
    "productId": {
      "type": "string",
      "pattern": "^PROD-[0-9]{6}$"
    },
    "name": {
      "type": "string",
      "stringModel": {
        "patterns": ["Llll Llll", "Llll"],
        "valueFrequency": {
          "Wireless Mouse": 10,
          "Gaming Keyboard": 8,
          "USB Hub": 6,
          "Monitor Stand": 5
        }
      }
    },
    "category": {
      "type": "string",
      "enum": ["Electronics", "Accessories", "Computers", "Gaming"]
    },
    "price": {
      "type": "number",
      "minimum": 9.99,
      "maximum": 999.99,
      "histogram": {
        "bins": [
          {"rangeStart": 9.99, "rangeEnd": 49.99, "frequency": 0.4},
          {"rangeStart": 50, "rangeEnd": 199.99, "frequency": 0.35},
          {"rangeStart": 200, "rangeEnd": 499.99, "frequency": 0.2},
          {"rangeStart": 500, "rangeEnd": 999.99, "frequency": 0.05}
        ]
      }
    },
    "inStock": {
      "type": "boolean"
    },
    "stockQuantity": {
      "type": "integer",
      "minimum": 0,
      "maximum": 500
    }
  },
  "required": ["productId", "name", "category", "price", "inStock"]
}

Step 2: Create Customer Schema

{
  "type": "object",
  "properties": {
    "customerId": {
      "type": "string",
      "pattern": "^CUST-[0-9]{8}$"
    },
    "email": {
      "type": "string",
      "format": "email",
      "stringModel": {
        "patterns": ["llll.llll@llll.com"],
        "entropyScore": 3.5
      }
    },
    "firstName": {
      "type": "string",
      "stringModel": {
        "patterns": ["Llll"],
        "valueFrequency": {
          "John": 15, "Jane": 12, "Michael": 10,
          "Sarah": 11, "Robert": 9, "Emily": 10
        }
      }
    },
    "lastName": {
      "type": "string",
      "stringModel": {
        "patterns": ["Llll", "Ll'Llll"],
        "valueFrequency": {
          "Smith": 20, "Johnson": 15, "Williams": 12,
          "Brown": 10, "Jones": 8, "Garcia": 7
        }
      }
    },
    "registrationDate": {
      "type": "string",
      "format": "date"
    },
    "tier": {
      "type": "string",
      "enum": ["Bronze", "Silver", "Gold", "Platinum"],
      "enumProbabilities": [0.5, 0.3, 0.15, 0.05]
    }
  }
}

Step 3: Create Order Schema with Relationships

{
  "type": "object",
  "properties": {
    "orderId": {
      "type": "string",
      "pattern": "^ORD-[0-9]{10}$"
    },
    "customerId": {
      "type": "string",
      "pattern": "^CUST-[0-9]{8}$"
    },
    "orderDate": {
      "type": "string",
      "format": "date-time"
    },
    "items": {
      "type": "array",
      "minItems": 1,
      "maxItems": 10,
      "items": {
        "type": "object",
        "properties": {
          "productId": {
            "type": "string",
            "pattern": "^PROD-[0-9]{6}$"
          },
          "quantity": {
            "type": "integer",
            "minimum": 1,
            "maximum": 5
          },
          "unitPrice": {
            "type": "number",
            "minimum": 9.99,
            "maximum": 999.99
          }
        }
      }
    },
    "totalAmount": {
      "type": "number",
      "minimum": 9.99,
      "maximum": 9999.99
    },
    "status": {
      "type": "string",
      "enum": ["pending", "processing", "shipped", "delivered", "cancelled"],
      "enumProbabilities": [0.1, 0.2, 0.3, 0.35, 0.05]
    }
  }
}

Step 4: Design Time-Based Sequence

Using Tides Editor:

{
  "name": "E-commerce Daily Flow",
  "startTime": "2024-01-01T00:00:00Z",
  "endTime": "2024-01-01T23:59:59Z",
  "intervalMs": 60000,
  "steps": [
    {
      "stepId": "products",
      "modelPath": "./products.json",
      "startOffset": 0,
      "documentsPerInterval": 2,
      "generationProbability": 0.1
    },
    {
      "stepId": "customers",
      "modelPath": "./customers.json",
      "startOffset": 0,
      "documentsPerInterval": 5,
      "generationProbability": 0.3
    },
    {
      "stepId": "orders",
      "modelPath": "./orders.json",
      "startOffset": 3600000,
      "documentsPerInterval": 10,
      "generationProbability": 0.8
    }
  ],
  "transactions": [
    {
      "transactionId": "customer-orders",
      "parentStepId": "customers",
      "childSteps": [
        {
          "stepId": "orders",
          "minCount": 0,
          "maxCount": 5,
          "additionalDelayMs": 300000
        }
      ],
      "triggerProbability": 0.7,
      "linkingStrategy": "InjectParentId",
      "parentIdField": "customerId"
    }
  ]
}

Step 5: Generate Test Data

Using CLI:

## Generate initial catalog
DataFlood generate products.json --count 1000 --output products.json

## Generate customers
DataFlood generate customers.json --count 500 --output customers.json

## Execute sequence for orders
DataFlood sequence ecommerce-sequence.json --output orders.json

Using API:

## Start FloodGate API server
FloodGate api

## Generate via API
curl -X POST "http://localhost:5000/api/serving/sequence/ecommerce-sequence" \
  -H "Content-Type: application/json" \
  -d '{"maxDocuments": 10000}'

IoT Sensor Network Simulation

Scenario

Simulate a network of IoT sensors generating temperature, humidity, and motion data with realistic patterns and anomalies.

Sensor Schema

{
  "type": "object",
  "properties": {
    "sensorId": {
      "type": "string",
      "pattern": "^SENSOR-[A-Z]{2}-[0-9]{4}$"
    },
    "timestamp": {
      "type": "string",
      "format": "date-time"
    },
    "location": {
      "type": "object",
      "properties": {
        "building": {
          "type": "string",
          "enum": ["A", "B", "C", "D"]
        },
        "floor": {
          "type": "integer",
          "minimum": 1,
          "maximum": 10
        },
        "room": {
          "type": "string",
          "pattern": "^[0-9]{3}$"
        }
      }
    },
    "temperature": {
      "type": "number",
      "minimum": 15.0,
      "maximum": 30.0,
      "histogram": {
        "bins": [
          {"rangeStart": 15, "rangeEnd": 18, "frequency": 0.1},
          {"rangeStart": 18, "rangeEnd": 22, "frequency": 0.6},
          {"rangeStart": 22, "rangeEnd": 25, "frequency": 0.25},
          {"rangeStart": 25, "rangeEnd": 30, "frequency": 0.05}
        ]
      }
    },
    "humidity": {
      "type": "number",
      "minimum": 30,
      "maximum": 70
    },
    "motion": {
      "type": "boolean"
    },
    "batteryLevel": {
      "type": "number",
      "minimum": 0,
      "maximum": 100
    }
  }
}

Anomaly Detection Test Data

Create a sequence that includes normal and anomalous readings:

{
  "name": "Sensor Network with Anomalies",
  "intervalMs": 5000,
  "steps": [
    {
      "stepId": "normal-readings",
      "modelPath": "./sensor-normal.json",
      "documentsPerInterval": 20,
      "generationProbability": 0.95,
      "weight": 9.0
    },
    {
      "stepId": "anomalies",
      "modelPath": "./sensor-anomaly.json",
      "documentsPerInterval": 1,
      "generationProbability": 0.05,
      "weight": 1.0,
      "customProperties": {
        "anomalyType": "temperature_spike",
        "severity": "high"
      }
    }
  ]
}

Generate Streaming Data

## Generate continuous stream
DataFlood sequence sensor-sequence.json \
  --max-docs 100000 \
  --format jsonl \
  --output sensor-stream.jsonl

Banking Transaction System

Scenario

Generate banking transactions with realistic patterns, including accounts, transactions, and fraud detection test cases.

Account Schema

{
  "type": "object",
  "properties": {
    "accountNumber": {
      "type": "string",
      "pattern": "^[0-9]{10}$"
    },
    "accountType": {
      "type": "string",
      "enum": ["checking", "savings", "credit"]
    },
    "balance": {
      "type": "number",
      "minimum": 0,
      "maximum": 1000000,
      "histogram": {
        "bins": [
          {"rangeStart": 0, "rangeEnd": 1000, "frequency": 0.3},
          {"rangeStart": 1000, "rangeEnd": 10000, "frequency": 0.4},
          {"rangeStart": 10000, "rangeEnd": 100000, "frequency": 0.25},
          {"rangeStart": 100000, "rangeEnd": 1000000, "frequency": 0.05}
        ]
      }
    },
    "customerId": {
      "type": "string",
      "pattern": "^[0-9]{8}$"
    },
    "openDate": {
      "type": "string",
      "format": "date"
    }
  }
}

Transaction Schema with Patterns

{
  "type": "object",
  "properties": {
    "transactionId": {
      "type": "string",
      "format": "uuid"
    },
    "accountNumber": {
      "type": "string",
      "pattern": "^[0-9]{10}$"
    },
    "transactionType": {
      "type": "string",
      "enum": ["deposit", "withdrawal", "transfer", "payment", "fee"],
      "enumProbabilities": [0.2, 0.3, 0.25, 0.2, 0.05]
    },
    "amount": {
      "type": "number",
      "histogram": {
        "bins": [
          {"rangeStart": 0.01, "rangeEnd": 100, "frequency": 0.5},
          {"rangeStart": 100, "rangeEnd": 500, "frequency": 0.3},
          {"rangeStart": 500, "rangeEnd": 2000, "frequency": 0.15},
          {"rangeStart": 2000, "rangeEnd": 10000, "frequency": 0.05}
        ]
      }
    },
    "merchantCategory": {
      "type": "string",
      "enum": ["grocery", "gas", "restaurant", "retail", "online", "atm", "other"],
      "enumProbabilities": [0.25, 0.15, 0.15, 0.2, 0.15, 0.05, 0.05]
    },
    "timestamp": {
      "type": "string",
      "format": "date-time"
    },
    "location": {
      "type": "object",
      "properties": {
        "city": {
          "type": "string",
          "stringModel": {
            "valueFrequency": {
              "New York": 20,
              "Los Angeles": 15,
              "Chicago": 10,
              "Houston": 8
            }
          }
        },
        "country": {
          "type": "string",
          "enum": ["USA", "Canada", "Mexico"]
        }
      }
    }
  }
}

Fraud Test Cases

Create specific patterns for fraud testing:

## Normal transactions
DataFlood generate transaction-normal.json \
  --count 10000 \
  --seed 42 \
  --output normal-transactions.json

## Suspicious patterns (rapid transactions)
DataFlood sequence fraud-rapid-sequence.json \
  --output fraud-rapid.json

## Unusual amounts
DataFlood generate transaction-unusual.json \
  --entropy 5.0 \
  --count 100 \
  --output fraud-amounts.json

Healthcare Records

Scenario

Generate synthetic patient records for healthcare system testing while maintaining privacy.

Patient Schema

{
  "type": "object",
  "properties": {
    "patientId": {
      "type": "string",
      "pattern": "^PAT-[0-9]{10}$"
    },
    "demographics": {
      "type": "object",
      "properties": {
        "age": {
          "type": "integer",
          "minimum": 0,
          "maximum": 120,
          "histogram": {
            "bins": [
              {"rangeStart": 0, "rangeEnd": 18, "frequency": 0.2},
              {"rangeStart": 18, "rangeEnd": 65, "frequency": 0.6},
              {"rangeStart": 65, "rangeEnd": 120, "frequency": 0.2}
            ]
          }
        },
        "gender": {
          "type": "string",
          "enum": ["M", "F", "O"],
          "enumProbabilities": [0.49, 0.49, 0.02]
        }
      }
    },
    "conditions": {
      "type": "array",
      "items": {
        "type": "string",
        "enum": ["diabetes", "hypertension", "asthma", "arthritis", "none"],
        "enumProbabilities": [0.1, 0.15, 0.08, 0.07, 0.6]
      },
      "minItems": 0,
      "maxItems": 3
    },
    "lastVisit": {
      "type": "string",
      "format": "date"
    }
  }
}

Appointment Generation

{
  "name": "Healthcare Appointments",
  "intervalMs": 900000,  // 15 minutes
  "steps": [
    {
      "stepId": "routine-appointments",
      "modelPath": "./appointment-routine.json",
      "documentsPerInterval": 4,
      "generationProbability": 0.8
    },
    {
      "stepId": "emergency-visits",
      "modelPath": "./appointment-emergency.json",
      "documentsPerInterval": 1,
      "generationProbability": 0.1
    }
  ]
}

Log File Generation

Scenario

Generate application logs for testing log analysis and monitoring systems.

Application Log Schema

{
  "type": "object",
  "properties": {
    "timestamp": {
      "type": "string",
      "format": "date-time"
    },
    "level": {
      "type": "string",
      "enum": ["DEBUG", "INFO", "WARN", "ERROR", "FATAL"],
      "enumProbabilities": [0.3, 0.5, 0.15, 0.04, 0.01]
    },
    "service": {
      "type": "string",
      "enum": ["api", "database", "cache", "queue", "worker"],
      "enumProbabilities": [0.4, 0.2, 0.15, 0.15, 0.1]
    },
    "message": {
      "type": "string",
      "stringModel": {
        "patterns": [
          "Request processed successfully",
          "Database connection established",
          "Cache miss for key: XXXX",
          "Queue message processed",
          "Error: Connection timeout"
        ],
        "entropyScore": 2.5
      }
    },
    "requestId": {
      "type": "string",
      "format": "uuid"
    },
    "duration": {
      "type": "integer",
      "minimum": 1,
      "maximum": 5000,
      "histogram": {
        "bins": [
          {"rangeStart": 1, "rangeEnd": 100, "frequency": 0.7},
          {"rangeStart": 100, "rangeEnd": 1000, "frequency": 0.25},
          {"rangeStart": 1000, "rangeEnd": 5000, "frequency": 0.05}
        ]
      }
    }
  }
}

Generate Log Stream

## Generate logs in JSONL format for streaming
DataFlood generate app-log.json \
  --count 100000 \
  --format jsonl \
  --output app-logs.jsonl

## Or use sequence for time-based patterns
DataFlood sequence log-sequence.json \
  --format jsonl \
  --max-docs 1000000 \
  --output logs-stream.jsonl

API Testing Data

Scenario

Generate test data for API endpoints including various HTTP methods and response codes.

API Request Schema

{
  "type": "object",
  "properties": {
    "method": {
      "type": "string",
      "enum": ["GET", "POST", "PUT", "DELETE", "PATCH"],
      "enumProbabilities": [0.5, 0.25, 0.15, 0.05, 0.05]
    },
    "endpoint": {
      "type": "string",
      "stringModel": {
        "patterns": [
          "/api/users/dddd",
          "/api/products/dddd",
          "/api/orders/dddd",
          "/api/search"
        ]
      }
    },
    "headers": {
      "type": "object",
      "properties": {
        "authorization": {
          "type": "string",
          "pattern": "^Bearer [A-Za-z0-9]{32}$"
        },
        "contentType": {
          "type": "string",
          "enum": ["application/json", "application/xml", "text/plain"]
        }
      }
    },
    "body": {
      "type": "object",
      "additionalProperties": true
    },
    "responseCode": {
      "type": "integer",
      "enum": [200, 201, 204, 400, 401, 403, 404, 500],
      "enumProbabilities": [0.6, 0.1, 0.05, 0.08, 0.05, 0.02, 0.05, 0.05]
    }
  }
}

Best Practices for Use Cases

1. Start with Real Data

Collect sample data from production (anonymized)
Use DataFloodEditor import to analyze patterns
Refine statistical models based on analysis

2. Incremental Development

Start with simple schemas
Add complexity gradually
Test each iteration
Validate against requirements

3. Realistic Distributions

Use histograms for numeric values
Set appropriate enum probabilities
Configure string patterns from samples
Test edge cases

4. Sequence Design

Model real-world timing patterns
Include peak and off-peak periods
Add appropriate delays between related events
Test with short time ranges first

5. Performance Testing

## Small batch for validation
DataFlood generate schema.json --count 10

## Medium batch for testing
DataFlood generate schema.json --count 1000

## Large batch for performance
DataFlood generate schema.json --count 100000 --format csv

6. Data Validation

Always validate generated data
Check constraint satisfaction
Verify relationships
Test with downstream systems

Integration Examples

Python Integration

import requests
import pandas as pd

## Generate data via API
response = requests.post(
    "http://localhost:5000/api/documentgenerator/generate-simple",
    json={"schema": schema, "count": 1000, "format": "csv"}
)

## Load into pandas
df = pd.read_csv(io.StringIO(response.text))

Database Loading

## Generate CSV
DataFlood generate schema.json --format csv --count 10000 --output data.csv

## Load into PostgreSQL
psql -c "COPY table_name FROM 'data.csv' CSV HEADER"

## Load into MySQL
mysql -e "LOAD DATA INFILE 'data.csv' INTO TABLE table_name"

Streaming Pipeline

## Generate streaming data
DataFlood sequence stream-sequence.json --format jsonl --output - | \
  kafka-console-producer --topic test-data --broker-list localhost:9092

Conclusion

These use cases demonstrate the flexibility of DataFlood Suite for generating realistic test data across various domains. Key takeaways:

Model real patterns: Geenerate DataFlood models from actual data
Design relationships: Create realistic dependencies between entities
Time-based generation: Model temporal patterns accurately
Test thoroughly: Validate data meets requirements
Scale appropriately: Start small, scale up gradually

For more examples and detailed configurations, refer to the component-specific documentation.