DataFlood CLI Reference

Commands Overview

DataFlood provides three main commands:

  1. Model Generation - Analyze data to create statistical models
  2. Document Generation - Generate synthetic data from DataFlood models
  3. Sequence Generation (Tides) - Create time-based document sequences

Model Generation Command

Syntax

DataFlood <folder-path> [output-file]

Description

Analyzes JSON and CSV files in a directory to generate a DataFlood model with statistical models.

Parameters

Parameter Required Description Default
folder-path Yes Directory containing sample JSON/CSV files -
output-file No Output model filename generated-model.json

Examples

## Basic usage
DataFlood ./sample-data

## Specify output file
DataFlood ./sample-data my-model.json

## Use absolute path
DataFlood /home/user/data /home/user/models/output.json

Output

Creates a JSON file containing:

  • DataFlood model structure
  • Statistical string models
  • Numeric histograms
  • Format specifications
  • Required field analysis

Document Generation Command

Syntax

DataFlood generate <model-file> [options]

Description

Generates synthetic documents based on a DataFlood model.

Parameters

Parameter Required Description
model-file Yes Path to DataFlood model file

Options

Option Short Description Default Range/Values
--count -c Number of documents to generate 1 1-10000
--seed -s Random seed for reproducibility random Any integer
--output -o Output filename generated-documents.json Any valid path
--format -f Output format json json, csv
--separate - Generate individual files false flag
--individual - Alias for --separate false flag
--metadata - Include generation metadata false flag
--include-metadata - Alias for --metadata false flag
--entropy -e Override entropy for strings model default >= 0.0

Examples

Basic Generation
## Generate single document
DataFlood generate model.json

## Generate 100 documents
DataFlood generate model.json --count 100

## Short form
DataFlood generate model.json -c 100
Reproducible Generation
## Use seed for consistent results
DataFlood generate model.json --seed 42 --count 50

## Short form
DataFlood generate model.json -s 42 -c 50
Output Control
## Specify output file
DataFlood generate model.json --output my-data.json

## Generate CSV format
DataFlood generate model.json --format csv --output data.csv

## Short form
DataFlood generate model.json -f csv -o data.csv
Separate Files
## Generate individual files: doc-001.json, doc-002.json, etc.
DataFlood generate model.json --count 10 --separate --output doc.json

## CSV separate files
DataFlood generate model.json -c 5 --separate -f csv -o record.csv
## Creates: record-001.csv, record-002.csv, etc.
Metadata Inclusion
## Include generation metadata
DataFlood generate model.json --count 10 --metadata

## Metadata with CSV
DataFlood generate model.json -f csv --metadata -c 100
Entropy Control
## Low entropy (vocabulary-based)
DataFlood generate model.json --entropy 1.0 --count 10

## Medium entropy (pattern-based)
DataFlood generate model.json --entropy 3.0 --count 10

## High entropy (character distribution)
DataFlood generate model.json --entropy 5.0 --count 10

## Short form
DataFlood generate model.json -e 2.5 -c 20

Output Formats

JSON Output Structure

Without metadata:

[
  { "field1": "value1", "field2": 123 },
  { "field1": "value2", "field2": 456 }
]

With metadata:

{
  "generatedAt": "2024-01-15T10:30:00Z",
  "count": 2,
  "seed": 42,
  "modelFile": "model.json",
  "documents": [
    { "field1": "value1", "field2": 123 },
    { "field1": "value2", "field2": 456 }
  ]
}
CSV Output Structure

Without metadata:

field1,field2,nested.field3
value1,123,nestedValue1
value2,456,nestedValue2

With metadata (as comments):

## Generated At: 2024-01-15 10:30:00 UTC
## Document Count: 2
## Seed: 42
## Model File: model.json

field1,field2,nested.field3
value1,123,nestedValue1
value2,456,nestedValue2

Sequence Generation Command (Tides)

Syntax

DataFlood sequence <config-file> [options]

Description

Generates time-based document sequences with parent-child relationships using Tides configuration.

Parameters

Parameter Required Description
config-file Yes Path to Tides configuration file

Options

Option Short Description Default
--output -o Output filename auto-generated
--max-docs -n Maximum documents to generate unlimited
--format -f Output format (json, csv) json
--validate-only - Only validate configuration false
--validate - Alias for --validate-only false
--metadata - Include generation metadata false
--seed -s Override seed for execution config seed

Examples

Basic Tides
## Execute Tides
DataFlood sequence my-tides.json

## With output file
DataFlood sequence my-tides.json --output results.json

## Short form
DataFlood sequence my-tides.json -o results.json
Validation
## Validate configuration only
DataFlood sequence my-tides.json --validate-only

## Shorter alias
DataFlood sequence my-tides.json --validate
Document Limits
## Limit to 1000 documents
DataFlood sequence my-tides.json --max-docs 1000

## Short form
DataFlood sequence my-tides.json -n 1000
Format and Metadata
## CSV output with metadata
DataFlood sequence my-tides.json --format csv --metadata

## Short form
DataFlood sequence my-tides.json -f csv --metadata
Seed Override
## Override configuration seed
DataFlood sequence my-tides.json --seed 12345 --max-docs 500

## Short form
DataFlood sequence my-tides.json -s 12345 -n 500

Tides Configuration File Format

Basic structure:

{
  "name": "My Tides",
  "startTime": "2024-01-01T00:00:00Z",
  "endTime": "2024-01-01T01:00:00Z",
  "intervalMs": 1000,
  "steps": [
    {
      "stepId": "step1",
      "modelPath": "model1.json",
      "documentsPerInterval": 1,
      "generationProbability": 0.8
    }
  ],
  "transactions": [
    {
      "parentStepId": "step1",
      "childSteps": [
        {
          "stepId": "step2",
          "minCount": 1,
          "maxCount": 3
        }
      ]
    }
  ]
}

Test Command

Syntax

DataFlood

Description

Runs the built-in test suite to verify DataFlood functionality.

Output

  • Test results for each component
  • Success/failure indicators
  • Test run folder with logs
  • Performance metrics

Example

## Run tests
DataFlood

## Output:
## === DataFlood and FloodGate Test Results ===
## Test run folder: test-run_20240115_103000
## ...test results...

Help Command

Syntax

DataFlood --help
DataFlood -h

Description

Shows usage information for all commands.

Exit Codes

Code Description
0 Success
1 Invalid arguments
2 File not found
3 Model validation error
4 Generation error
5 I/O error

Environment Variables

Variable Description Default
DATAFLOOD_MAX_THREADS Maximum parallel threads System default
DATAFLOOD_TEMP_DIR Temporary file directory System temp
DATAFLOOD_LOG_LEVEL Logging verbosity Info

Performance Tips

Large-Scale Generation

## Generate in batches for memory efficiency
DataFlood generate model.json --count 1000 --separate

## Use specific seed for reproducible batches
DataFlood generate model.json --count 1000 --seed 1 --output batch1.json
DataFlood generate model.json --count 1000 --seed 2 --output batch2.json

CSV Performance

## CSV is more memory-efficient for large datasets
DataFlood generate model.json --count 10000 --format csv

Entropy Optimization

## Lower entropy = faster generation
DataFlood generate model.json --entropy 1.0 --count 10000

Common Use Cases

Test Data Generation

## Consistent test data
DataFlood generate model.json --seed 42 --count 100 --output test-data.json

Data Migration Testing

## Generate various formats
DataFlood generate model.json --format json --count 1000 --output data.json
DataFlood generate model.json --format csv --count 1000 --output data.csv

Performance Testing

## Generate large datasets
DataFlood generate model.json --count 10000 --format csv --output perf-test.csv

Development Fixtures

## Individual files for fixtures
DataFlood generate model.json --count 20 --separate --output fixture.json

See Also