Skip to content
This repository has been archived by the owner on Apr 14, 2023. It is now read-only.

Commit

Permalink
Merge pull request #1452 from finos/1433-Add-CLI-options-to-docs
Browse files Browse the repository at this point in the history
1433 add cli options to docs
  • Loading branch information
Tom-hayden authored Oct 11, 2019
2 parents 9114432 + cfff586 commit e5f81ac
Show file tree
Hide file tree
Showing 3 changed files with 153 additions and 2 deletions.
121 changes: 119 additions & 2 deletions docs/UserGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,7 @@ After loading the set from the file, this constraint behaves identically to the
### `inMap` _(field, file, key)_

```javascript
{
{
"field": "country",
"is": "inMap",
"file": "countries.csv",
Expand All @@ -380,7 +380,7 @@ After loading the set from the file, this constraint behaves identically to the

Is satisfied if `field`'s value is in the map with the key `Country`.

For each field using the same map when one value is picked from the map all fields will use the same row.
For each field using the same map when one value is picked from the map all fields will use the same row.

It populates the map from a new-line delimited file (with suffix `.csv`), where each line represents a value to load. A header is required in the file to identify which column is related to a key.

Expand Down Expand Up @@ -652,3 +652,120 @@ Is satisfied if either:
- The `if` constraint is not satisfied, and the `else` constraint is

While it's not prohibited, wrapping conditional constraints in any other kind of constraint (eg, a `not`) may cause unintuitive results.


#Running a Profile

Profiles can be run against a jar using the command line.

##Command line arguments

Currently the only mode fully supported by the data helix is generate mode. An example command would be something like

`java -jar generator.jar generate --max-rows=100 --replace --profile-file=profile.json --output-path=output.csv`

### Generate command line arguments
Option switches are case-sensitive, arguments are case-insensitive

* `--profile-file=<path>` (or `-p <path>`)
* Path to the input profile file.
* `--output-path=<path>` (or `-o <path>`)
* Path to the output file. If not specified, output will be to standard output.
* `--replace`
* Overwrite/replace existing output files. Defaults to false.
* `-n <rows>` or `--max-rows <rows>`
* Emit at most `<rows>` rows to the output file, if not specified will limit to 10,000,000 rows.
* Mandatory in `RANDOM` mode.
* `--generation-type`
* Determines the type of (data generation)[Link] performed. Supported options are `FULL_SEQUENTIAL` and `RANDOM`(default).
* `--combination-strategy`
* Determines the type of combination strategy used in full sequential mode. Supported options are `MINIMAL`(default), `EXHAUSTIVE` and `PINNING`.
* `--output-format`
* Determines the output format. Supported options are `csv`(default) and `json`.
* `--ndjson`
* When combined with the `--output-format=json` flag sets the output format to [ndjson](http://ndjson.org/). Defaults to true if the `--output-flag` is not set and to false if it is set.

By default the generator will report how much data has been generated over time, the other options are below:
* `--verbose`
* Will report in-depth detail of data generation
* `--quiet`
* Will disable velocity reporting

`--quiet` will be ignored if `--verbose` is supplied.

##Generation strategies
The generation mode can be specified by the `--generation-type` flag.

The generator supports the following data generation types

* Random (_default_)
* Full Sequential

### Random
Generate some random data that abides by the given set of constraints.

Examples:

| Constraint | Emitted valid data |
| ---- | ---- | ---- |
| `Field 1 > 10 AND Field 1 < 20` | _(any values > 10 & < 20)_ |
| `Field 1 in set [A, B, C]` | _(A, B or C in any order, repeated as needed)_ |

Notes:
- Random generation of data is infinite and is limited to 1000 by default, use `--max-rows` to enable generation of more data.

#### Full Sequential
Generate all data that can be generated in order from lowest to highest.

Examples:

| Constraint | Emitted valid data |
| ---- | ---- | ---- |
| `Field 1 > 0 AND Field 1 < 5` | _(null, 1, 2, 3, 4)_ |
| `Field 1 in set [A, B, C]` | _(null, A, B, C)_ |

* Note that null will only be produced depending on the [properties](https://github.com/finos/datahelix/blob/master/docs/UserGuide.md#Fields) of Field 1.


####Combination strategies
There are a few different combination strategies which can be used in **full sequential mode** with minimal being the default. In modes other than full sequential, combination strategy will have no effect.

It is simplest to see how the different combination strategies work by look at the effect on a simple example profile. The following [profile](https://github.com/finos/datahelix/tree/master/examples/multiple-fields/profile.json) contains two fields:
* field1 - has values in set [ "A", "B" ]
* field2 - has values in set [ 1, 2, 3 ]

##### Minimal

The minimal strategy outputs the minimum data required to exemplify each value at least once. Per the example, the output would be:

* "A",1
* "B",2
* "B",3

Note that minimal is the default combination strategy.

##### Exhaustive

The exhaustive strategy outputs all possible combinations. Given the fields as defined above, possible outputs would be:

* "A",1
* "B",1
* "A",2
* "B",2
* "A",3
* "B",3


##### Pinning

The pinning strategy establishes a baseline for each field (generally by picking the first available value for that field) and then creates outputs such that either:

* All values equal the baseline for the respective field
* All values except one equal the baseline for the respective field

To generate these outputs, we first output the first case (all values from baseline) and then iterate through each field, F, fixing all other fields at their baseline and generating the full range of values for F. For the example, the output would be:

* "A",1
* "A",2
* "A",3
* "B",1
3 changes: 3 additions & 0 deletions examples/multiple-fileds/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Two different fields using in set constraints.

This profile is used as an aide for understanding [combination strategies](https://github.com/finos/datahelix/blob/master/docs/UserGuide.md#combination-strategies).
31 changes: 31 additions & 0 deletions examples/multiple-fileds/profile.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"schemaVersion": "0.11",
"fields": [
{
"name": "field1",
"type": "string",
"nullable": false
},
{
"name": "field2",
"type": "integer",
"nullable": false
}
],
"rules": [
{
"constraints": [
{
"field": "field1",
"is": "inSet",
"values": ["A", "B"]
},
{
"field": "field2",
"is": "inSet",
"values": ["1", "2", "3"]
}
]
}
]
}

0 comments on commit e5f81ac

Please sign in to comment.