diff --git a/.circleci/config.yml b/.circleci/config.yml index 42949ebb5..29bed47ed 100644 --- a/.circleci/config.yml +++ b/.circleci/config.yml @@ -21,7 +21,7 @@ jobs: - image: circleci/openjdk:8-jdk-browsers steps: - checkout - # If changing build tools be sure to update GeneratorSetup.md in docs + # If changing build tools be sure to update BuildAndRun.md in docs - run: gradle fatJar :output:test :profile:test :generator:test :common:test :orchestrator:test - run: name: Save test results diff --git a/README.md b/README.md index 971d3c584..fdd8ce758 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,13 @@ # DataHelix Generator [![CircleCI](https://circleci.com/gh/finos/datahelix.svg?style=svg)](https://circleci.com/gh/finos/datahelix) [![FINOS - Incubating](https://cdn.jsdelivr.net/gh/finos/contrib-toolbox@master/images/badge-incubating.svg)](https://finosfoundation.atlassian.net/wiki/display/FINOS/Incubating) -![DataHelix logo](logo.png) +![DataHelix logo](docs/logo.png) The generation of representative test and simulation data is a challenging and time-consuming task. The DataHelix generator allows you to quickly create data, based on a JSON profile that defines fields and the relationships between them, for the purpose of testing and validation. The generator supports a number of generation modes, allowing the creation of data that both conforms to, or violates, the profile. DataHelix is a proud member of the [Fintech Open Source Foundation](https://www.finos.org/) and operates within the [Data Technologies Program](https://www.finos.org/dt). - [Getting Started](#Getting-Started) + - [First Time Setup](docs/user/gettingStarted/BuildAndRun.md) - [Creating your first profile](#Creating-your-first-profile) - [Adding constraints](#Adding-constraints) - [Generating large datasets](#Generating-large-datasets) @@ -17,15 +18,16 @@ DataHelix is a proud member of the [Fintech Open Source Foundation](https://www. - [Contributing](#Contributing) - [License](#License) + # Getting Started -_The following guide gives a 10 minute introduction to the generator via various practical examples. For more detailed documentation please refer to the [Profile Developer Guide](docs/ProfileDeveloperGuide.md), and if you are interested in extending / modifying the generator itself, refer to the [DataHelix Generator Developer Guide](docs/GeneratorDeveloperGuide.md)._ +_The following guide gives a 10 minute introduction to the generator via various practical examples. For more detailed documentation please refer to the [User Guide](docs/user/UserGuide.md). If you are interested in extending / modifying the generator itself please refer to the [Developer Guide](docs/developer/DeveloperGuide.md)._ -The generator has been written in Java, allowing it to work on Microsoft Windows, Apple Mac and Linux. You will need Java v1.8 installed to run the generator (you can run `java version` to check whether you meet this requirement), it can be [downloaded here](https://www.java.com/en/download/manual.jsp). +The generator has been written in Java, allowing it to work on Microsoft Windows, Apple Mac and Linux. You will need Java v1.8 installed to run the generator (you can run `java -version` to check whether you meet this requirement), it can be [downloaded here](https://www.java.com/en/download/manual.jsp). The generator is distributed as a JAR file, with the latest release always available from the [GitHub releases page](https://github.com/finos/datahelix/releases/). The project is currently in beta and under active development. You can expect breaking changes in future releases, and new features too! -You are also welcome to download the source code and build the generator yourself. To do so, follow the instructions for [downloading and building it using a Java IDE](generator/docs/GeneratorSetup.md), or for [downloading and building it using Docker](generator/docs/DockerSetup.md). +You are also welcome to download the source code and build the generator yourself. To do so, follow the instructions for [downloading and building it using a Java IDE](docs/user/gettingStarted/BuildAndRun.md), or for [downloading and building it using Docker](docs/developer/DockerSetup.md). Your feedback on the beta would be greatly appreciated. If you have any issues, feature requests, or ideas, please share them via the [GitHub issues page](https://github.com/finos/datahelix/issues). @@ -157,8 +159,6 @@ The generator supports four different data types: - **string** - sequences of unicode characters up to a maximum length of 1000 characters - **datetime** - specific moments in time, with values in the range 0001-01-01T00:00:00.000 to 9999-12-31T23:59:59.999, with an optional granularity / precision (from a maximum of one year to a minimum of one millisecond) that can be defined via a `granularTo` constraint. - - We'll expand the example profile to add a new `age` field, a not-null integer in the range 1-99: ```json @@ -296,42 +296,17 @@ firstName,age,nationalInsurance [...] ``` -You can find out more about the various constraints the generator supports in the detailed [Profile Developer Guide](docs/ProfileDeveloperGuide.md). +You can find out more about the various constraints the generator supports in the detailed [User Guide](docs/user/UserGuide.md). ## Generation modes The generator supports a number of different generation modes: -- **random** - generates random data that abides by the given set of constraints, with the number of generated rows limited via the `--max-rows` option. -- **interesting** - generates data that is typically [deemed 'interesting'](https://github.com/finos/datahelix/wiki/Interesting-data-generation) from a test perspective, for example exploring [boundary values](https://en.wikipedia.org/wiki/Boundary-value_analysis). - -The mode is specified via the `--generation-type` option. The following example outputs 'interesting' values for the current profile: - -``` -$ java -jar generator.jar generate --generation-type interesting --replace --profile-file=profile.json --output-path=output.csv -``` - -In this case it generates just 14 rows where you can see that it is exploring the boundary values of the constraints: - -``` -firstName,age,nationalInsurance -"Jon",18,"AA000000" -"John",18,"AA000000" -"Jon",18,"AJ000000F" -"John",18,"AJ000000F" -"Jon",19,"AA000000" -"John",19,"AA000000" -"Jon",19,"AJ000000F" -"John",19,"AJ000000F" -"Jon",1, -"John",1, -"Jon",99,"AA000000" -"John",99,"AA000000" -"Jon",99,"AJ000000F" -"John",99,"AJ000000F" -``` +- **random** - _(default)_ generates random data that abides by the given set of constraints, with the number of generated rows limited via the `--max-rows` option. +- **full** - generates all the data that abides by the given set of constraints, with the number of generated rows limited via the `--max-rows` option. +- **interesting** - _(alpha feature)_ generates data that is typically [deemed 'interesting'](docs/user/alphaFeatures/Interesting.md) from a test perspective, for example exploring [boundary values](https://en.wikipedia.org/wiki/Boundary-value_analysis). - +The mode is specified via the `--generation-type` option. ## Generating invalid data @@ -411,8 +386,8 @@ firstName,age,nationalInsurance ## Next steps -That's the end of our getting started guide. Hopefully it has given you a good understanding of what the DataHelix generator is capable of. If you'd like to find out more about the various constraints the tool supports, the [Profile Developer Guide](docs/ProfileDeveloperGuide.md) is a good next step. You might also be interested in the [examples folder](https://github.com/finos/datahelix/tree/master/examples), which illustrates various features of the generator. -For more detail about the behaviour of certain profiles, see the [behaviour in detail.](./docs/BehaviourInDetail.md) +That's the end of our getting started guide. Hopefully it has given you a good understanding of what the DataHelix generator is capable of. If you'd like to find out more about the various constraints the tool supports, the [User Guide](docs/user/UserGuide.md) is a good next step. You might also be interested in the [examples folder](https://github.com/finos/datahelix/tree/master/examples), which illustrates various features of the generator. +For more detail about the behaviour of certain profiles, see the [behaviour in detail.](docs/developer/behaviour/BehaviourInDetail.md) ## Contributing diff --git a/docs/GeneratorDeveloperGuide.md b/docs/GeneratorDeveloperGuide.md deleted file mode 100644 index 7d49a5142..000000000 --- a/docs/GeneratorDeveloperGuide.md +++ /dev/null @@ -1,13 +0,0 @@ -## Key Concepts - -1. [Design Decisions](KeyDecisions.md) -1. [Decision Trees](DecisionTrees/DecisionTrees.md) -1. [Profile Syntax](Schema.md) - - -## Development - -1. [Contributing](../.github/CONTRIBUTING.md) -2. [Build and Run the Generator](../generator/docs/GeneratorSetup.md) -3. [Dependency Injection](DependencyInjection.md) -4. [Cucumber Testing](CucumberSyntax.md) diff --git a/docs/CucumberSyntax.md b/docs/developer/CucumberSyntax.md similarity index 95% rename from docs/CucumberSyntax.md rename to docs/developer/CucumberSyntax.md index d09b7de65..685451fd1 100644 --- a/docs/CucumberSyntax.md +++ b/docs/developer/CucumberSyntax.md @@ -22,9 +22,9 @@ More examples can be seen in the [generator cucumber features](https://github.co The framework supports setting configuration settings for the generator, defining the profile and describing the expected outcome. All of these are described below, all variable elements (e.g. `{generationStrategy}` are case insensitive), all fields and values **are case sensitive**. ### Configuration options -* _the generation strategy is `{generationStrategy}`_ see [generation strategies](https://github.com/finos/datahelix/blob/master/generator/docs/GenerationTypes.md) - default: `random` -* _the combination strategy is `{combinationStrategy}`_ see [combination strategies](https://github.com/finos/datahelix/blob/master/generator/docs/CombinationStrategies.md) - default: `exhaustive` -* _the walker type is `{walkerType}`_ see [walker types](https://github.com/finos/datahelix/blob/master/generator/docs/TreeWalkerTypes.md) - default: `reductive` +* _the generation strategy is `{generationStrategy}`_ see [generation strategies](https://github.com/finos/datahelix/blob/master/docs/user/generationTypes/GenerationTypes.md) - default: `random` +* _the combination strategy is `{combinationStrategy}`_ see [combination strategies](https://github.com/finos/datahelix/blob/master/docs/user/CombinationStrategies.md) - default: `exhaustive` +* _the walker type is `{walkerType}`_ see [walker types](https://github.com/finos/datahelix/blob/master/docs/developer/decisionTreeWalkers/TreeWalkerTypes.md) - default: `reductive` * _the data requested is `{generationMode}`_, either `violating` or `validating` - default: `validating` * _the generator can generate at most `{int}` rows_, ensures that the generator will only emit `int` rows, default: `1000` * _we do not violate constraint `{operator}`_, prevent this operator from being violated (see **Operators** section below), you can specify this step many times if required @@ -49,7 +49,7 @@ Operators are converted to English language equivalents for use in cucumber, so * _untyped fields are allowed_, sets the --allow-untyped-fields flag to false - default: flag is true #### Operators -See [Predicate constraints](ProfileDeveloperGuide.md#Predicate-constraints), [Grammatical Constraints](ProfileDeveloperGuide.md#Grammatical-constraints) and [Presentational Constraints](ProfileDeveloperGuide.md#Presentational-constraints) for details of the constraints. +See [Predicate constraints](../user/UserGuide.md#Predicate-constraints), [Grammatical Constraints](../user/UserGuide.md#Grammatical-constraints) and [Presentational Constraints](../user/UserGuide.md#Presentational-constraints) for details of the constraints. #### Operands When specifying the operator/s for a field, ensure to format the value as in the table below: diff --git a/docs/DependencyInjection.md b/docs/developer/DependencyInjection.md similarity index 100% rename from docs/DependencyInjection.md rename to docs/developer/DependencyInjection.md diff --git a/docs/developer/DeveloperGuide.md b/docs/developer/DeveloperGuide.md new file mode 100644 index 000000000..f770d4e4d --- /dev/null +++ b/docs/developer/DeveloperGuide.md @@ -0,0 +1,26 @@ +## Key Concepts + +1. [Design Decisions](KeyDecisions.md) +1. [Decision Trees](decisionTrees/DecisionTrees.md) +1. [Profile Syntax](../user/Schema.md) + + +## Development + +1. [Contributing](../../.github/CONTRIBUTING.md) +2. [Build and Run the Generator](../user/gettingStarted/BuildAndRun.md) +4. [Dependency Injection](DependencyInjection.md) +5. [Cucumber Testing](CucumberSyntax.md) + +## Behavioural Explanations + +1. [Behaviour in Detail](behaviour/BehaviourInDetail.md) +1. [Null Operator](behaviour/NullOperator.md) + +## Key Algorithms and Data Structures + +1. [Decision Trees](decisionTrees/DecisionTrees.md) +1. [Generation Algorithm](algorithmsAndDataStructures/GenerationAlgorithm.md) +1. [Field Fixing Strategy](algorithmsAndDataStructures/FieldFixingStrategy.md) +1. [String Generation](algorithmsAndDataStructures/StringGeneration.md) +1. [Tree Walker Types](decisionTreeWalkers/TreeWalkerTypes.md) \ No newline at end of file diff --git a/generator/docs/DockerSetup.md b/docs/developer/DockerSetup.md similarity index 97% rename from generator/docs/DockerSetup.md rename to docs/developer/DockerSetup.md index 868993b1d..3fe49f3b4 100644 --- a/generator/docs/DockerSetup.md +++ b/docs/developer/DockerSetup.md @@ -1,6 +1,6 @@ # Build and run the generator using Docker -The instructions below explain how to download the source code, and then build and run it using Docker. This generates a self-contained executable Docker image which can then run the generator without needing to install a JRE. If you would like to download and build the source code in order to contribute to development, we recommend you [build and run the generator using an IDE](GeneratorSetup.md) instead. +The instructions below explain how to download the source code, and then build and run it using Docker. This generates a self-contained executable Docker image which can then run the generator without needing to install a JRE. If you would like to download and build the source code in order to contribute to development, we recommend you [build and run the generator using an IDE](../user/gettingStarted/BuildAndRun.md) instead. ## Get Code diff --git a/docs/KeyDecisions.md b/docs/developer/KeyDecisions.md similarity index 100% rename from docs/KeyDecisions.md rename to docs/developer/KeyDecisions.md diff --git a/generator/docs/FieldFixingStrategy.md b/docs/developer/algorithmsAndDataStructures/FieldFixingStrategy.md similarity index 100% rename from generator/docs/FieldFixingStrategy.md rename to docs/developer/algorithmsAndDataStructures/FieldFixingStrategy.md diff --git a/generator/docs/GenerationAlgorithm.md b/docs/developer/algorithmsAndDataStructures/GenerationAlgorithm.md similarity index 85% rename from generator/docs/GenerationAlgorithm.md rename to docs/developer/algorithmsAndDataStructures/GenerationAlgorithm.md index ef769a8ce..8cb2f5503 100644 --- a/generator/docs/GenerationAlgorithm.md +++ b/docs/developer/algorithmsAndDataStructures/GenerationAlgorithm.md @@ -1,12 +1,12 @@ # Decision tree generation -Given a set of rules, generate a [decision tree](../../docs/DecisionTrees/DecisionTrees.md) (or multiple if [partitioning](../../docs/DecisionTrees/Optimisation.md#Partitioning) was successful). +Given a set of rules, generate a [decision tree](../decisionTrees/DecisionTrees.md) (or multiple if [partitioning](../decisionTrees/Optimisation.md#Partitioning) was successful). ## Decision tree interpretation An interpretation of the decision tree is defined by chosing an option for every decision visited in the tree. -![](interpreted-graph.png) +![](../../user/images/interpreted-graph.png) In the above diagram the red lines represent one interpretation of the graph, for every decision an option has been chosen and we end up with the set of constraints that the red lines touch at any point. These constraints are reduced into a fieldspec (see [Constraint Reduction](#constraint-reduction) below). @@ -32,7 +32,7 @@ could collapse to *(note: this is a conceptual example and not a reflection of actual object structure)* -See [Set restriction and generation](SetRestrictionAndGeneration.md) for a more indepth explanation of how the constraints are merged and data generated. +See [Set restriction and generation](../../user/SetRestrictionAndGeneration.md) for a more in depth explanation of how the constraints are merged and data generated. This object has all the information needed to produce the values `[3, 4, 5, 6]`. @@ -50,9 +50,9 @@ Databags can be merged, but merging two databags fails if they have any keys in Fieldspecs are able to produce streams of databags containing valid values for the field they describe. Additional operations can then be applied over these streams, such as: -* A memoizing decorator that records values being output so they can be replayed inexpensively +* A memoization decorator that records values being output so they can be replayed inexpensively * A filtering decorator that prevents repeated values being output -* A merger that takes multiple streams and applies one of the available [combination strategies](CombinationStrategies.md) +* A merger that takes multiple streams and applies one of the available [combination strategies](../../user/CombinationStrategies.md) * A concatenator that takes multiple streams and outputs all the members of each # Output diff --git a/generator/docs/OptimisationProcess.md b/docs/developer/algorithmsAndDataStructures/OptimisationProcess.md similarity index 100% rename from generator/docs/OptimisationProcess.md rename to docs/developer/algorithmsAndDataStructures/OptimisationProcess.md diff --git a/generator/docs/StringGeneration.md b/docs/developer/algorithmsAndDataStructures/StringGeneration.md similarity index 84% rename from generator/docs/StringGeneration.md rename to docs/developer/algorithmsAndDataStructures/StringGeneration.md index d8dda0a31..4919c0505 100644 --- a/generator/docs/StringGeneration.md +++ b/docs/developer/algorithmsAndDataStructures/StringGeneration.md @@ -1,8 +1,8 @@ # String Generation -We use a Java library called [dk.brics.automaton](http://www.brics.dk/automaton/) to analyse regexes and generate valid (and invalid for [violation](DeliberateViolation.md)) strings based on them. It works by representing the regex as a finite state machine. It might be worth reading about state machines for those who aren't familiar: [https://en.wikipedia.org/wiki/Finite-state_machine](https://en.wikipedia.org/wiki/Finite-state_machine). Consider the following regex: `ABC[a-z]?(A|B)`. It would be represented by the following state machine: +We use a Java library called [dk.brics.automaton](http://www.brics.dk/automaton/) to analyse regexes and generate valid (and invalid for [violation](../../user/alphaFeatures/DeliberateViolation.md)) strings based on them. It works by representing the regex as a finite state machine. It might be worth reading about state machines for those who aren't familiar: [https://en.wikipedia.org/wiki/Finite-state_machine](https://en.wikipedia.org/wiki/Finite-state_machine). Consider the following regex: `ABC[a-z]?(A|B)`. It would be represented by the following state machine: -![](finite-state-machine.svg) +![](../../user/images/finite-state-machine.svg)