Use Cases and Examples

This guide provides real-world examples and complete workflows for common DataFlood Suite use cases.

E-Commerce Platform Testing

Scenario

Generate realistic test data for an e-commerce platform including products, customers, orders, and transactions.

Step 1: Create Product Schema

Using DataFloodEditor:

  1. Open Model Editor
  2. Create schema structure:
{
  "type": "object",
  "properties": {
    "productId": {
      "type": "string",
      "pattern": "^PROD-[0-9]{6}$"
    },
    "name": {
      "type": "string",
      "stringModel": {
        "patterns": ["Llll Llll", "Llll"],
        "valueFrequency": {
          "Wireless Mouse": 10,
          "Gaming Keyboard": 8,
          "USB Hub": 6,
          "Monitor Stand": 5
        }
      }
    },
    "category": {
      "type": "string",
      "enum": ["Electronics", "Accessories", "Computers", "Gaming"]
    },
    "price": {
      "type": "number",
      "minimum": 9.99,
      "maximum": 999.99,
      "histogram": {
        "bins": [
          {"rangeStart": 9.99, "rangeEnd": 49.99, "frequency": 0.4},
          {"rangeStart": 50, "rangeEnd": 199.99, "frequency": 0.35},
          {"rangeStart": 200, "rangeEnd": 499.99, "frequency": 0.2},
          {"rangeStart": 500, "rangeEnd": 999.99, "frequency": 0.05}
        ]
      }
    },
    "inStock": {
      "type": "boolean"
    },
    "stockQuantity": {
      "type": "integer",
      "minimum": 0,
      "maximum": 500
    }
  },
  "required": ["productId", "name", "category", "price", "inStock"]
}

Step 2: Create Customer Schema

{
  "type": "object",
  "properties": {
    "customerId": {
      "type": "string",
      "pattern": "^CUST-[0-9]{8}$"
    },
    "email": {
      "type": "string",
      "format": "email",
      "stringModel": {
        "patterns": ["llll.llll@llll.com"],
        "entropyScore": 3.5
      }
    },
    "firstName": {
      "type": "string",
      "stringModel": {
        "patterns": ["Llll"],
        "valueFrequency": {
          "John": 15, "Jane": 12, "Michael": 10,
          "Sarah": 11, "Robert": 9, "Emily": 10
        }
      }
    },
    "lastName": {
      "type": "string",
      "stringModel": {
        "patterns": ["Llll", "Ll'Llll"],
        "valueFrequency": {
          "Smith": 20, "Johnson": 15, "Williams": 12,
          "Brown": 10, "Jones": 8, "Garcia": 7
        }
      }
    },
    "registrationDate": {
      "type": "string",
      "format": "date"
    },
    "tier": {
      "type": "string",
      "enum": ["Bronze", "Silver", "Gold", "Platinum"],
      "enumProbabilities": [0.5, 0.3, 0.15, 0.05]
    }
  }
}

Step 3: Create Order Schema with Relationships

{
  "type": "object",
  "properties": {
    "orderId": {
      "type": "string",
      "pattern": "^ORD-[0-9]{10}$"
    },
    "customerId": {
      "type": "string",
      "pattern": "^CUST-[0-9]{8}$"
    },
    "orderDate": {
      "type": "string",
      "format": "date-time"
    },
    "items": {
      "type": "array",
      "minItems": 1,
      "maxItems": 10,
      "items": {
        "type": "object",
        "properties": {
          "productId": {
            "type": "string",
            "pattern": "^PROD-[0-9]{6}$"
          },
          "quantity": {
            "type": "integer",
            "minimum": 1,
            "maximum": 5
          },
          "unitPrice": {
            "type": "number",
            "minimum": 9.99,
            "maximum": 999.99
          }
        }
      }
    },
    "totalAmount": {
      "type": "number",
      "minimum": 9.99,
      "maximum": 9999.99
    },
    "status": {
      "type": "string",
      "enum": ["pending", "processing", "shipped", "delivered", "cancelled"],
      "enumProbabilities": [0.1, 0.2, 0.3, 0.35, 0.05]
    }
  }
}

Step 4: Design Time-Based Sequence

Using Tides Editor:

{
  "name": "E-commerce Daily Flow",
  "startTime": "2024-01-01T00:00:00Z",
  "endTime": "2024-01-01T23:59:59Z",
  "intervalMs": 60000,
  "steps": [
    {
      "stepId": "products",
      "modelPath": "./products.json",
      "startOffset": 0,
      "documentsPerInterval": 2,
      "generationProbability": 0.1
    },
    {
      "stepId": "customers",
      "modelPath": "./customers.json",
      "startOffset": 0,
      "documentsPerInterval": 5,
      "generationProbability": 0.3
    },
    {
      "stepId": "orders",
      "modelPath": "./orders.json",
      "startOffset": 3600000,
      "documentsPerInterval": 10,
      "generationProbability": 0.8
    }
  ],
  "transactions": [
    {
      "transactionId": "customer-orders",
      "parentStepId": "customers",
      "childSteps": [
        {
          "stepId": "orders",
          "minCount": 0,
          "maxCount": 5,
          "additionalDelayMs": 300000
        }
      ],
      "triggerProbability": 0.7,
      "linkingStrategy": "InjectParentId",
      "parentIdField": "customerId"
    }
  ]
}

Step 5: Generate Test Data

Using CLI:

## Generate initial catalog
DataFlood generate products.json --count 1000 --output products.json

## Generate customers
DataFlood generate customers.json --count 500 --output customers.json

## Execute sequence for orders
DataFlood sequence ecommerce-sequence.json --output orders.json

Using API:

## Start FloodGate API server
FloodGate api

## Generate via API
curl -X POST "http://localhost:5000/api/serving/sequence/ecommerce-sequence" \
  -H "Content-Type: application/json" \
  -d '{"maxDocuments": 10000}'

IoT Sensor Network Simulation

Scenario

Simulate a network of IoT sensors generating temperature, humidity, and motion data with realistic patterns and anomalies.

Sensor Schema

{
  "type": "object",
  "properties": {
    "sensorId": {
      "type": "string",
      "pattern": "^SENSOR-[A-Z]{2}-[0-9]{4}$"
    },
    "timestamp": {
      "type": "string",
      "format": "date-time"
    },
    "location": {
      "type": "object",
      "properties": {
        "building": {
          "type": "string",
          "enum": ["A", "B", "C", "D"]
        },
        "floor": {
          "type": "integer",
          "minimum": 1,
          "maximum": 10
        },
        "room": {
          "type": "string",
          "pattern": "^[0-9]{3}$"
        }
      }
    },
    "temperature": {
      "type": "number",
      "minimum": 15.0,
      "maximum": 30.0,
      "histogram": {
        "bins": [
          {"rangeStart": 15, "rangeEnd": 18, "frequency": 0.1},
          {"rangeStart": 18, "rangeEnd": 22, "frequency": 0.6},
          {"rangeStart": 22, "rangeEnd": 25, "frequency": 0.25},
          {"rangeStart": 25, "rangeEnd": 30, "frequency": 0.05}
        ]
      }
    },
    "humidity": {
      "type": "number",
      "minimum": 30,
      "maximum": 70
    },
    "motion": {
      "type": "boolean"
    },
    "batteryLevel": {
      "type": "number",
      "minimum": 0,
      "maximum": 100
    }
  }
}

Anomaly Detection Test Data

Create a sequence that includes normal and anomalous readings:

{
  "name": "Sensor Network with Anomalies",
  "intervalMs": 5000,
  "steps": [
    {
      "stepId": "normal-readings",
      "modelPath": "./sensor-normal.json",
      "documentsPerInterval": 20,
      "generationProbability": 0.95,
      "weight": 9.0
    },
    {
      "stepId": "anomalies",
      "modelPath": "./sensor-anomaly.json",
      "documentsPerInterval": 1,
      "generationProbability": 0.05,
      "weight": 1.0,
      "customProperties": {
        "anomalyType": "temperature_spike",
        "severity": "high"
      }
    }
  ]
}

Generate Streaming Data

## Generate continuous stream
DataFlood sequence sensor-sequence.json \
  --max-docs 100000 \
  --format jsonl \
  --output sensor-stream.jsonl

Banking Transaction System

Scenario

Generate banking transactions with realistic patterns, including accounts, transactions, and fraud detection test cases.

Account Schema

{
  "type": "object",
  "properties": {
    "accountNumber": {
      "type": "string",
      "pattern": "^[0-9]{10}$"
    },
    "accountType": {
      "type": "string",
      "enum": ["checking", "savings", "credit"]
    },
    "balance": {
      "type": "number",
      "minimum": 0,
      "maximum": 1000000,
      "histogram": {
        "bins": [
          {"rangeStart": 0, "rangeEnd": 1000, "frequency": 0.3},
          {"rangeStart": 1000, "rangeEnd": 10000, "frequency": 0.4},
          {"rangeStart": 10000, "rangeEnd": 100000, "frequency": 0.25},
          {"rangeStart": 100000, "rangeEnd": 1000000, "frequency": 0.05}
        ]
      }
    },
    "customerId": {
      "type": "string",
      "pattern": "^[0-9]{8}$"
    },
    "openDate": {
      "type": "string",
      "format": "date"
    }
  }
}

Transaction Schema with Patterns

{
  "type": "object",
  "properties": {
    "transactionId": {
      "type": "string",
      "format": "uuid"
    },
    "accountNumber": {
      "type": "string",
      "pattern": "^[0-9]{10}$"
    },
    "transactionType": {
      "type": "string",
      "enum": ["deposit", "withdrawal", "transfer", "payment", "fee"],
      "enumProbabilities": [0.2, 0.3, 0.25, 0.2, 0.05]
    },
    "amount": {
      "type": "number",
      "histogram": {
        "bins": [
          {"rangeStart": 0.01, "rangeEnd": 100, "frequency": 0.5},
          {"rangeStart": 100, "rangeEnd": 500, "frequency": 0.3},
          {"rangeStart": 500, "rangeEnd": 2000, "frequency": 0.15},
          {"rangeStart": 2000, "rangeEnd": 10000, "frequency": 0.05}
        ]
      }
    },
    "merchantCategory": {
      "type": "string",
      "enum": ["grocery", "gas", "restaurant", "retail", "online", "atm", "other"],
      "enumProbabilities": [0.25, 0.15, 0.15, 0.2, 0.15, 0.05, 0.05]
    },
    "timestamp": {
      "type": "string",
      "format": "date-time"
    },
    "location": {
      "type": "object",
      "properties": {
        "city": {
          "type": "string",
          "stringModel": {
            "valueFrequency": {
              "New York": 20,
              "Los Angeles": 15,
              "Chicago": 10,
              "Houston": 8
            }
          }
        },
        "country": {
          "type": "string",
          "enum": ["USA", "Canada", "Mexico"]
        }
      }
    }
  }
}

Fraud Test Cases

Create specific patterns for fraud testing:

## Normal transactions
DataFlood generate transaction-normal.json \
  --count 10000 \
  --seed 42 \
  --output normal-transactions.json

## Suspicious patterns (rapid transactions)
DataFlood sequence fraud-rapid-sequence.json \
  --output fraud-rapid.json

## Unusual amounts
DataFlood generate transaction-unusual.json \
  --entropy 5.0 \
  --count 100 \
  --output fraud-amounts.json

Healthcare Records

Scenario

Generate synthetic patient records for healthcare system testing while maintaining privacy.

Patient Schema

{
  "type": "object",
  "properties": {
    "patientId": {
      "type": "string",
      "pattern": "^PAT-[0-9]{10}$"
    },
    "demographics": {
      "type": "object",
      "properties": {
        "age": {
          "type": "integer",
          "minimum": 0,
          "maximum": 120,
          "histogram": {
            "bins": [
              {"rangeStart": 0, "rangeEnd": 18, "frequency": 0.2},
              {"rangeStart": 18, "rangeEnd": 65, "frequency": 0.6},
              {"rangeStart": 65, "rangeEnd": 120, "frequency": 0.2}
            ]
          }
        },
        "gender": {
          "type": "string",
          "enum": ["M", "F", "O"],
          "enumProbabilities": [0.49, 0.49, 0.02]
        }
      }
    },
    "conditions": {
      "type": "array",
      "items": {
        "type": "string",
        "enum": ["diabetes", "hypertension", "asthma", "arthritis", "none"],
        "enumProbabilities": [0.1, 0.15, 0.08, 0.07, 0.6]
      },
      "minItems": 0,
      "maxItems": 3
    },
    "lastVisit": {
      "type": "string",
      "format": "date"
    }
  }
}

Appointment Generation

{
  "name": "Healthcare Appointments",
  "intervalMs": 900000,  // 15 minutes
  "steps": [
    {
      "stepId": "routine-appointments",
      "modelPath": "./appointment-routine.json",
      "documentsPerInterval": 4,
      "generationProbability": 0.8
    },
    {
      "stepId": "emergency-visits",
      "modelPath": "./appointment-emergency.json",
      "documentsPerInterval": 1,
      "generationProbability": 0.1
    }
  ]
}

Log File Generation

Scenario

Generate application logs for testing log analysis and monitoring systems.

Application Log Schema

{
  "type": "object",
  "properties": {
    "timestamp": {
      "type": "string",
      "format": "date-time"
    },
    "level": {
      "type": "string",
      "enum": ["DEBUG", "INFO", "WARN", "ERROR", "FATAL"],
      "enumProbabilities": [0.3, 0.5, 0.15, 0.04, 0.01]
    },
    "service": {
      "type": "string",
      "enum": ["api", "database", "cache", "queue", "worker"],
      "enumProbabilities": [0.4, 0.2, 0.15, 0.15, 0.1]
    },
    "message": {
      "type": "string",
      "stringModel": {
        "patterns": [
          "Request processed successfully",
          "Database connection established",
          "Cache miss for key: XXXX",
          "Queue message processed",
          "Error: Connection timeout"
        ],
        "entropyScore": 2.5
      }
    },
    "requestId": {
      "type": "string",
      "format": "uuid"
    },
    "duration": {
      "type": "integer",
      "minimum": 1,
      "maximum": 5000,
      "histogram": {
        "bins": [
          {"rangeStart": 1, "rangeEnd": 100, "frequency": 0.7},
          {"rangeStart": 100, "rangeEnd": 1000, "frequency": 0.25},
          {"rangeStart": 1000, "rangeEnd": 5000, "frequency": 0.05}
        ]
      }
    }
  }
}

Generate Log Stream

## Generate logs in JSONL format for streaming
DataFlood generate app-log.json \
  --count 100000 \
  --format jsonl \
  --output app-logs.jsonl

## Or use sequence for time-based patterns
DataFlood sequence log-sequence.json \
  --format jsonl \
  --max-docs 1000000 \
  --output logs-stream.jsonl

API Testing Data

Scenario

Generate test data for API endpoints including various HTTP methods and response codes.

API Request Schema

{
  "type": "object",
  "properties": {
    "method": {
      "type": "string",
      "enum": ["GET", "POST", "PUT", "DELETE", "PATCH"],
      "enumProbabilities": [0.5, 0.25, 0.15, 0.05, 0.05]
    },
    "endpoint": {
      "type": "string",
      "stringModel": {
        "patterns": [
          "/api/users/dddd",
          "/api/products/dddd",
          "/api/orders/dddd",
          "/api/search"
        ]
      }
    },
    "headers": {
      "type": "object",
      "properties": {
        "authorization": {
          "type": "string",
          "pattern": "^Bearer [A-Za-z0-9]{32}$"
        },
        "contentType": {
          "type": "string",
          "enum": ["application/json", "application/xml", "text/plain"]
        }
      }
    },
    "body": {
      "type": "object",
      "additionalProperties": true
    },
    "responseCode": {
      "type": "integer",
      "enum": [200, 201, 204, 400, 401, 403, 404, 500],
      "enumProbabilities": [0.6, 0.1, 0.05, 0.08, 0.05, 0.02, 0.05, 0.05]
    }
  }
}

Best Practices for Use Cases

1. Start with Real Data

  • Collect sample data from production (anonymized)
  • Use DataFloodEditor import to analyze patterns
  • Refine statistical models based on analysis

2. Incremental Development

  • Start with simple schemas
  • Add complexity gradually
  • Test each iteration
  • Validate against requirements

3. Realistic Distributions

  • Use histograms for numeric values
  • Set appropriate enum probabilities
  • Configure string patterns from samples
  • Test edge cases

4. Sequence Design

  • Model real-world timing patterns
  • Include peak and off-peak periods
  • Add appropriate delays between related events
  • Test with short time ranges first

5. Performance Testing

## Small batch for validation
DataFlood generate schema.json --count 10

## Medium batch for testing
DataFlood generate schema.json --count 1000

## Large batch for performance
DataFlood generate schema.json --count 100000 --format csv

6. Data Validation

  • Always validate generated data
  • Check constraint satisfaction
  • Verify relationships
  • Test with downstream systems

Integration Examples

Python Integration

import requests
import pandas as pd

## Generate data via API
response = requests.post(
    "http://localhost:5000/api/documentgenerator/generate-simple",
    json={"schema": schema, "count": 1000, "format": "csv"}
)

## Load into pandas
df = pd.read_csv(io.StringIO(response.text))

Database Loading

## Generate CSV
DataFlood generate schema.json --format csv --count 10000 --output data.csv

## Load into PostgreSQL
psql -c "COPY table_name FROM 'data.csv' CSV HEADER"

## Load into MySQL
mysql -e "LOAD DATA INFILE 'data.csv' INTO TABLE table_name"

Streaming Pipeline

## Generate streaming data
DataFlood sequence stream-sequence.json --format jsonl --output - | \
  kafka-console-producer --topic test-data --broker-list localhost:9092

Conclusion

These use cases demonstrate the flexibility of DataFlood Suite for generating realistic test data across various domains. Key takeaways:

  1. Model real patterns: Geenerate DataFlood models from actual data
  2. Design relationships: Create realistic dependencies between entities
  3. Time-based generation: Model temporal patterns accurately
  4. Test thoroughly: Validate data meets requirements
  5. Scale appropriately: Start small, scale up gradually

For more examples and detailed configurations, refer to the component-specific documentation.