Skip to content
This repository has been archived by the owner on Apr 14, 2023. It is now read-only.

Commit

Permalink
Merge pull request #1444 from finos/1433-fix-docs
Browse files Browse the repository at this point in the history
1433 fix docs
  • Loading branch information
Tom-hayden authored Oct 10, 2019
2 parents 70914f2 + 9670671 commit 83edbea
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 21 deletions.
19 changes: 9 additions & 10 deletions docs/DeveloperGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@

# Introduction

This guide outlines how to contribute to the project as well as the key concepts and structure of the DataHelix.
This guide outlines how to contribute to the project as well as the key concepts and structure of the DataHelix.

* For information on how to get started with DataHelix see our [Getting Started guide](GettingStarted.md)

Expand Down Expand Up @@ -147,7 +147,7 @@ Then change the below (in the new file)...
"const": "0.1"
},
...
```
```
...to this:
```
...
Expand All @@ -160,7 +160,7 @@ Then change the below (in the new file)...

You will need to update the test in _ProfileSchemaImmutabilityTests_ to contain the new schema version generated. Old versions should **not** be modified. This is reflected by the test failing if any existing schemas are modified.

If you experience any issues with this test not updating the schema in IntelliJ, it is recommended to invalidate the cache and restart, or to delete the _profile/out_ directory and rebuild.
If you experience any issues with this test not updating the schema in IntelliJ, it is recommended to invalidate the cache and restart, or to delete the _profile/out_ directory and rebuild.

# Algorithms and Data Structures

Expand Down Expand Up @@ -194,7 +194,7 @@ One process involved in this is **constraint normalisation**, which transforms a
| `¬OR(X, Y, ...)` | `¬X, ¬Y, ...` |
| `¬AND(X, Y, ...)` | `OR(¬X, ¬Y, ...)` |
| `¬IF(X, Y)` | `X, ¬Y` |
| `¬IFELSE(X, Y, Z)` | `OR(AND(X, ¬Y), AND(¬X, ¬Z))` |
| `¬IFELSE(X, Y, Z)` | `OR(AND(X, ¬Y), AND(¬X, ¬Z))` |

We can convert a set of constraints to a Constraint Node as follows:

Expand Down Expand Up @@ -292,7 +292,7 @@ could collapse to
}
```

*(note: this is a conceptual example and not a reflection of actual object structure)*
*(note: this is a conceptual example and not a reflection of actual object structure)*

See [Set restriction and generation](user/SetRestrictionAndGeneration.md) for a more in depth explanation of how the constraints are merged and data generated.

Expand Down Expand Up @@ -327,7 +327,7 @@ CSV and JSON formats are currently supported.

## String Generation

We use a Java library called [dk.brics.automaton](http://www.brics.dk/automaton/) to analyse regexes and generate valid (and invalid for [violation](user/alphaFeatures/DeliberateViolation.md)) strings based on them. It works by representing the regex as a finite state machine. It might be worth reading about state machines for those who aren't familiar: [https://en.wikipedia.org/wiki/Finite-state_machine](https://en.wikipedia.org/wiki/Finite-state_machine). Consider the following regex: `ABC[a-z]?(A|B)`. It would be represented by the following state machine:
We use a Java library called [dk.brics.automaton](http://www.brics.dk/automaton/) to analyse regexes and generate valid strings based on them. It works by representing the regex as a finite state machine. It might be worth reading about state machines for those who aren't familiar: [https://en.wikipedia.org/wiki/Finite-state_machine](https://en.wikipedia.org/wiki/Finite-state_machine). Consider the following regex: `ABC[a-z]?(A|B)`. It would be represented by the following state machine:

![](user/images/finite-state-machine.svg)

Expand All @@ -339,7 +339,7 @@ Other than the fact that we can use the state machine to generate strings, the m
* Finding the intersection of two regexes, used when there are multiple regex constraints on the same field.
* Finding the complement of a regex, which we use for generating invalid regexes for violation.

Due to the way that the generator computes textual data internally the generation of strings is not deterministic and may output valid values in a different order with each generation run.
Due to the way that the generator computes textual data internally the generation of strings is not deterministic and may output valid values in a different order with each generation run.

### Anchors

Expand All @@ -358,7 +358,7 @@ A transition holds the following properties and are represented as lines in the

In the above `A` looks like:

| property | initial | \[a-z\] |
| property | initial | \[a-z\] |
| ---- | ---- | ---- |
| min | A | a |
| max | A | z |
Expand Down Expand Up @@ -466,7 +466,7 @@ Therefore the null operator can:
When a field is serialised or otherwise written to a medium, as the output of the generator, it may choose to represent the absence of a value by using the formats' `null` representation, or some other form such as omitting the property and so on.

#### For illustration
CSV files do not have any standard for representing the absence of a value differently to an empty string (unless all strings are always wrapped in quotes ([#441](https://github.com/ScottLogic/data-engineering-generator/pull/441)).
CSV files do not have any standard for representing the absence of a value differently to an empty string (unless all strings are always wrapped in quotes ([#441](https://github.com/ScottLogic/data-engineering-generator/pull/441)).

JSON files could be presented with `null` as the value for a property or excluding the property from the serialised result. This is the responsibility of the serialiser, and depends on the use cases.

Expand Down Expand Up @@ -642,4 +642,3 @@ Field is greater than number X:
|Null ||
|Strings ||
|Date-times ||

7 changes: 2 additions & 5 deletions docs/developer/CucumberCookbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,9 @@ This document outlines how Cucumber is used within DataHelix.
The framework supports setting configuration settings for the generator, defining the profile and describing the expected outcome. All of these are described below, all variable elements (e.g. `{generationStrategy}` are case insensitive), all fields and values **are case sensitive**.

### Configuration options
* _the generation strategy is `{generationStrategy}`_ see [generation strategies](https://github.com/finos/datahelix/blob/master/docs/user/generationTypes/GenerationTypes.md) - default: `random`
* _the combination strategy is `{combinationStrategy}`_ see [combination strategies](https://github.com/finos/datahelix/blob/master/docs/user/CombinationStrategies.md) - default: `exhaustive`
* _the walker type is `{walkerType}`_ see [walker types](https://github.com/finos/datahelix/blob/master/docs/developer/decisionTreeWalkers/TreeWalkerTypes.md) - default: `reductive`
* _the data requested is `{generationMode}`_, either `violating` or `validating` - default: `validating`
* _the generation strategy is `{generationStrategy}`_ see [generation strategies](https://github.com/finos/datahelix/blob/master/docs/UserGuide.md/#generation-strategies.md) - default: `random`
* _the combination strategy is `{combinationStrategy}`_ see [combination strategies](https://github.com/finos/datahelix/blob/master/docs/UserGuide.md/#Combination-strategies.md) - default: `exhaustive`
* _the generator can generate at most `{int}` rows_, ensures that the generator will only emit `int` rows, default: `1000`
* _we do not violate constraint `{operator}`_, prevent this operator from being violated (see **Operators** section below), you can specify this step many times if required

### Defining the profile
It is important to remember that constraints are built up of 3 components: a field, an operator and most commonly an operand. In the following example the operator is 'greaterThan' and the operand is 5.
Expand Down
11 changes: 5 additions & 6 deletions docs/user/Schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"description": "A dataset about financial products",
"fields":
[
{
{
"name": "id",
"type": "string",
"nullable": false
Expand Down Expand Up @@ -43,7 +43,7 @@
{ "field": "low_price", "is": "greaterThanOrEqualTo", "value": 0 }
]
},
{
{
"rule": "allowed countries",
"constraints": [
{ "field": "country", "is": "inSet", "values": [ "USA", "GB", "FRANCE" ] }
Expand Down Expand Up @@ -100,9 +100,8 @@ A named collection of constraints. Test case generation revolves around rules, i

One of:

- a [predicate constraint](UserGuide.md#Predicate-constraints)
- a [grammatical constraint](UserGuide.md#Grammatical-constraints)
- a [presentational constraint](UserGuide.md#Presentational-constraints)
- a [predicate constraint](https://github.com/finos/datahelix/blob/master/docs/UserGuide.md#Predicate-constraints)
- a [grammatical constraint](https://github.com/finos/datahelix/blob/master/docs/UserGuide.md#Grammatical-constraints)


The Profile schema format is formally documented in the [User Guide](UserGuide.md).
The Profile schema format is formally documented in the [User Guide](https://github.com/finos/datahelix/blob/master/docs/UserGuide.md).

0 comments on commit 83edbea

Please sign in to comment.