DataFlood CLI Reference
Commands Overview
DataFlood provides three main commands:
- Model Generation - Analyze data to create statistical models
- Document Generation - Generate synthetic data from DataFlood models
- Sequence Generation (Tides) - Create time-based document sequences
Model Generation Command
Syntax
DataFlood <folder-path> [output-file]
Description
Analyzes JSON and CSV files in a directory to generate a DataFlood model with statistical models.
Parameters
Parameter | Required | Description | Default |
---|---|---|---|
folder-path |
Yes | Directory containing sample JSON/CSV files | - |
output-file |
No | Output model filename | generated-model.json |
Examples
## Basic usage
DataFlood ./sample-data
## Specify output file
DataFlood ./sample-data my-model.json
## Use absolute path
DataFlood /home/user/data /home/user/models/output.json
Output
Creates a JSON file containing:
- DataFlood model structure
- Statistical string models
- Numeric histograms
- Format specifications
- Required field analysis
Document Generation Command
Syntax
DataFlood generate <model-file> [options]
Description
Generates synthetic documents based on a DataFlood model.
Parameters
Parameter | Required | Description |
---|---|---|
model-file |
Yes | Path to DataFlood model file |
Options
Option | Short | Description | Default | Range/Values |
---|---|---|---|---|
--count |
-c |
Number of documents to generate | 1 | 1-10000 |
--seed |
-s |
Random seed for reproducibility | random | Any integer |
--output |
-o |
Output filename | generated-documents.json | Any valid path |
--format |
-f |
Output format | json | json, csv |
--separate |
- | Generate individual files | false | flag |
--individual |
- | Alias for --separate | false | flag |
--metadata |
- | Include generation metadata | false | flag |
--include-metadata |
- | Alias for --metadata | false | flag |
--entropy |
-e |
Override entropy for strings | model default | >= 0.0 |
Examples
Basic Generation
## Generate single document
DataFlood generate model.json
## Generate 100 documents
DataFlood generate model.json --count 100
## Short form
DataFlood generate model.json -c 100
Reproducible Generation
## Use seed for consistent results
DataFlood generate model.json --seed 42 --count 50
## Short form
DataFlood generate model.json -s 42 -c 50
Output Control
## Specify output file
DataFlood generate model.json --output my-data.json
## Generate CSV format
DataFlood generate model.json --format csv --output data.csv
## Short form
DataFlood generate model.json -f csv -o data.csv
Separate Files
## Generate individual files: doc-001.json, doc-002.json, etc.
DataFlood generate model.json --count 10 --separate --output doc.json
## CSV separate files
DataFlood generate model.json -c 5 --separate -f csv -o record.csv
## Creates: record-001.csv, record-002.csv, etc.
Metadata Inclusion
## Include generation metadata
DataFlood generate model.json --count 10 --metadata
## Metadata with CSV
DataFlood generate model.json -f csv --metadata -c 100
Entropy Control
## Low entropy (vocabulary-based)
DataFlood generate model.json --entropy 1.0 --count 10
## Medium entropy (pattern-based)
DataFlood generate model.json --entropy 3.0 --count 10
## High entropy (character distribution)
DataFlood generate model.json --entropy 5.0 --count 10
## Short form
DataFlood generate model.json -e 2.5 -c 20
Output Formats
JSON Output Structure
Without metadata:
[
{ "field1": "value1", "field2": 123 },
{ "field1": "value2", "field2": 456 }
]
With metadata:
{
"generatedAt": "2024-01-15T10:30:00Z",
"count": 2,
"seed": 42,
"modelFile": "model.json",
"documents": [
{ "field1": "value1", "field2": 123 },
{ "field1": "value2", "field2": 456 }
]
}
CSV Output Structure
Without metadata:
field1,field2,nested.field3
value1,123,nestedValue1
value2,456,nestedValue2
With metadata (as comments):
## Generated At: 2024-01-15 10:30:00 UTC
## Document Count: 2
## Seed: 42
## Model File: model.json
field1,field2,nested.field3
value1,123,nestedValue1
value2,456,nestedValue2
Sequence Generation Command (Tides)
Syntax
DataFlood sequence <config-file> [options]
Description
Generates time-based document sequences with parent-child relationships using Tides configuration.
Parameters
Parameter | Required | Description |
---|---|---|
config-file |
Yes | Path to Tides configuration file |
Options
Option | Short | Description | Default |
---|---|---|---|
--output |
-o |
Output filename | auto-generated |
--max-docs |
-n |
Maximum documents to generate | unlimited |
--format |
-f |
Output format (json, csv) | json |
--validate-only |
- | Only validate configuration | false |
--validate |
- | Alias for --validate-only | false |
--metadata |
- | Include generation metadata | false |
--seed |
-s |
Override seed for execution | config seed |
Examples
Basic Tides
## Execute Tides
DataFlood sequence my-tides.json
## With output file
DataFlood sequence my-tides.json --output results.json
## Short form
DataFlood sequence my-tides.json -o results.json
Validation
## Validate configuration only
DataFlood sequence my-tides.json --validate-only
## Shorter alias
DataFlood sequence my-tides.json --validate
Document Limits
## Limit to 1000 documents
DataFlood sequence my-tides.json --max-docs 1000
## Short form
DataFlood sequence my-tides.json -n 1000
Format and Metadata
## CSV output with metadata
DataFlood sequence my-tides.json --format csv --metadata
## Short form
DataFlood sequence my-tides.json -f csv --metadata
Seed Override
## Override configuration seed
DataFlood sequence my-tides.json --seed 12345 --max-docs 500
## Short form
DataFlood sequence my-tides.json -s 12345 -n 500
Tides Configuration File Format
Basic structure:
{
"name": "My Tides",
"startTime": "2024-01-01T00:00:00Z",
"endTime": "2024-01-01T01:00:00Z",
"intervalMs": 1000,
"steps": [
{
"stepId": "step1",
"modelPath": "model1.json",
"documentsPerInterval": 1,
"generationProbability": 0.8
}
],
"transactions": [
{
"parentStepId": "step1",
"childSteps": [
{
"stepId": "step2",
"minCount": 1,
"maxCount": 3
}
]
}
]
}
Test Command
Syntax
DataFlood
Description
Runs the built-in test suite to verify DataFlood functionality.
Output
- Test results for each component
- Success/failure indicators
- Test run folder with logs
- Performance metrics
Example
## Run tests
DataFlood
## Output:
## === DataFlood and FloodGate Test Results ===
## Test run folder: test-run_20240115_103000
## ...test results...
Help Command
Syntax
DataFlood --help
DataFlood -h
Description
Shows usage information for all commands.
Exit Codes
Code | Description |
---|---|
0 | Success |
1 | Invalid arguments |
2 | File not found |
3 | Model validation error |
4 | Generation error |
5 | I/O error |
Environment Variables
Variable | Description | Default |
---|---|---|
DATAFLOOD_MAX_THREADS |
Maximum parallel threads | System default |
DATAFLOOD_TEMP_DIR |
Temporary file directory | System temp |
DATAFLOOD_LOG_LEVEL |
Logging verbosity | Info |
Performance Tips
Large-Scale Generation
## Generate in batches for memory efficiency
DataFlood generate model.json --count 1000 --separate
## Use specific seed for reproducible batches
DataFlood generate model.json --count 1000 --seed 1 --output batch1.json
DataFlood generate model.json --count 1000 --seed 2 --output batch2.json
CSV Performance
## CSV is more memory-efficient for large datasets
DataFlood generate model.json --count 10000 --format csv
Entropy Optimization
## Lower entropy = faster generation
DataFlood generate model.json --entropy 1.0 --count 10000
Common Use Cases
Test Data Generation
## Consistent test data
DataFlood generate model.json --seed 42 --count 100 --output test-data.json
Data Migration Testing
## Generate various formats
DataFlood generate model.json --format json --count 1000 --output data.json
DataFlood generate model.json --format csv --count 1000 --output data.csv
Performance Testing
## Generate large datasets
DataFlood generate model.json --count 10000 --format csv --output perf-test.csv
Development Fixtures
## Individual files for fixtures
DataFlood generate model.json --count 20 --separate --output fixture.json
See Also
- Getting Started - Quick start guide
- Core Concepts - Understanding DataFlood
- Examples - Real-world scenarios
- Troubleshooting - Common issues