Skip to content
This repository has been archived by the owner on Apr 14, 2023. It is now read-only.

Commit

Permalink
Merge pull request #1441 from finos/1340-make-null-behaviour-consistent
Browse files Browse the repository at this point in the history
1340 make null behaviour consistent
  • Loading branch information
r-stuart authored Nov 6, 2019
2 parents 633cf64 + 89ad8a4 commit 63fa581
Show file tree
Hide file tree
Showing 10 changed files with 150 additions and 188 deletions.
213 changes: 41 additions & 172 deletions docs/DeveloperGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -444,191 +444,60 @@ The algorithm generates row specs by:

# Behaviour in Detail

## Null Operator

The `null` operator in a profile, expressed as `"is": "null"` or the negated equivalent has several meanings. It can mean (and emit the behaviour) as described below:

### Possible scenarios:
## Type System

| Absence / Presence | Field value |
| ---- | ---- |
| (A) _null operator omitted_<br /> **The default**. The field's value may be absent or present | (B) `is null`<br />The field will have _no value_ |
| (C) `not(is null)`<br />The field's value must be present | (D) `not(is null)`<br />The field must have a value |
The primary types (Numeric, String and Datetime) are exclusive - a field can only be one of these types. Other types are reduced to a constrained subset of the values of one of these types. For example, integers are a subset of the numeric type.

Therefore the null operator can:
- (C, D) `not(is null)` express fields that must have a value (otherwise known as a non-nullable field)
- (B) `is null` express fields as having no value (otherwise known as setting the value to `null`)
- (A) _By omitting the constraint_: express fields as permitting absence or presence of a value (otherwise known as a nullable field)
## Nullability

### `null` and interoperability
`null` is a keyword/term that exists in other technologies and languages, so far as this tool is concerned it relates to the absence or the presence of a value. See [set restriction and generation](user/SetRestrictionAndGeneration.md) for more details.
Nulls are considered orthogonal to the type system. A given field's possible values can be considered as the union between the value `null` and all of the possible values of the given type for the field.

When a field is serialised or otherwise written to a medium, as the output of the generator, it may choose to represent the absence of a value by using the formats' `null` representation, or some other form such as omitting the property and so on.
Consider a field which permits the integers 1-3 inclusive which is nullable. This field would permit the following as possible values:

#### For illustration
CSV files do not have any standard for representing the absence of a value differently to an empty string (unless all strings are always wrapped in quotes ([#441](https://github.com/ScottLogic/data-engineering-generator/pull/441)).
```{ null } ∪ { 1, 2, 3}```

JSON files could be presented with `null` as the value for a property or excluding the property from the serialised result. This is the responsibility of the serialiser, and depends on the use cases.
## Null Precedence in Constraints

## Null Operator with If
With `if` constraints, the absence of a value needs to be considered in order to understand how the generator will behave. Remember, every set contains the empty set, unless excluded by way of the `not(is null)` constraint, for more details see [set restriction and generation](user/SetRestrictionAndGeneration.md).
All constraints are considered to only operate on the set of typed values, not the null portion. This is due to the orthogonal nature of nulls in the system.

Consider the following if constraint:

```
{
"if": {
{
"field": "field1",
"equalTo": 5
}
},
"then": {
{
"field": "field2",
"equalTo": "a"
}
}
}
```
One way to interpret this is to consider the union of null and typed values separately at all times, where constraints only operate on the respective fields' typed values. If a given field is decided to be `null`, then it is left with no possible typed values - which we call the empty set (`{ }`).

The generator will expand the `if` constraint as follows, to ensure the constraint is fully balanced:
For example, consider a pair of fields, each permitting a single integer and allowing null:

```
{
"if": {
{
"field": "field1",
"equalTo": 5
}
},
"then": {
{
"field": "field2",
"equalTo": "a"
}
},
"else": {
{
"not": {
"field": "field1",
"equalTo": 5
}
}
}
}
```
| field | values | nullable |
| ---- | ---- | ---- |
| `a` | ```{ 1, 2, 3 }``` | `true` |
| `b` | ```{ 1, 2 }``` | `true` |

This expression does not prevent the consequence (the `then` constraints) from being considered when `field1` has no value. Equally it does not say anything about the alternative consequence (the `else` constraints). As such both outcomes are applicable at any time.
with the equality relationship between them (ie. `a = b`).

The solution to this is to express the `if` constraint as follows. This is not 'auto completed' for profiles as it would remove functionality that may be intended, it must be explicitly included in the profile.
This would give the following possible values:

```
{
"if": {
"allOf": [
{
"field": "field1",
"equalTo": 5
},
{
"not": {
"field": "field1",
"is": "null"
}
}
]
},
"then": {
{
"field": "field2",
"equalTo": "a"
}
}
}
```
| a | b | Why? |
| ---- | ---- | ---- |
| `1` | `1` | When `a` and `b` are not-null, the equality relation holds |
| `2` | `2` | When `a` and `b` are not-null, the equality relation holds |
| `1` | `null` | When `b` is null, the equality operation doesn't apply |
| `2` | `null` | When `b` is null, the equality operation doesn't apply |
| `3` | `null` | When `b` is null, the equality operation doesn't apply |
| `null` | `1` | When `a` is null, the equality operation doesn't apply |
| `null` | `2` | When `a` is null, the equality operation doesn't apply |
| `null` | `null` | When both are null, the equality operation doesn't apply |

If we break the above down into first the null state, then the chosen value:

| a null | b null | a typed | b typed |
| ---- | ---- | ---- | ---- |
| `{ NOT null }` | `{ NOT null }` | `{ 1 }` | `{ 1 }` |
| `{ NOT null }` | `{ NOT null }` | `{ 2 }` | `{ 2 }` |
| `{ NOT null }` | `{ null }` | `{ 1 }` | `{ }` |
| `{ NOT null }` | `{ null }` | `{ 2 }` | `{ }` |
| `{ NOT null }` | `{ null }` | `{ 3 }` | `{ }` |
| `{ null }` | `{ NOT null }` | `{ }` | `{ 1 }` |
| `{ null }` | `{ NOT null }` | `{ }` | `{ 2 }` |
| `{ null }` | `{ null }` | `{ }` | `{ }` |

The generator will expand the `if` constraint as follows, to ensure the constraint is fully balanced:
## Null Operator

```
{
"if": {
"allOf": [
{
"field": "field1",
"equalTo": 5
},
{
"not": {
"field": "field1",
"is": "null"
}
}
]
},
"then": {
{
"field": "field2",
"equalTo": "a"
}
},
"else": {
"anyOf": [
{
"not": {
"field": "field1",
"equalTo": 5
}
},
{
"field": "field1",
"is": "null"
}
]
}
}
```
The `null` operator works by making the typed values set the empty set `{ }`. This produces `{ null } ∪ { } `, leaving null as the only valid value.

In this case the `then` constraints will only be applicable when `field1` has a value. Where `field1` has no value, either of the `else` constraints can be considered applicable. Nevertheless `field2` will only have the value `"a"` when `field1` has the value `5`, not when it is absent also.

### Examples:
Considering this use case, you're trying to generate data to be imported into a SQL server database. Below are some examples of constraints that may help define fields and their mandatoriness or optionality.

* A field that is non-nullable<br />
`field1 ofType string and field1 not(is null)`

* A field that is nullable<br />
`field1 ofType string`

* A field that has no value<br />
`field1 is null`

## Nullness
### Behaviour
Nulls can always be produced for a field, except when a field is explicitly not null. How the constraints behave with null is outlined below:

|Field is |Null produced|
|:----------------------|:-----------:|
|Of type X ||
|Not of type X ||
|In set [X, Y, ...] ||
|Not in set [X, Y, ...] ||
|Equal to X ||
|Not equal to X ||
|Greater than X ||
|Null ||
|Not null ||

## Type Implication
### Behaviour
No operators imply type (except ofType ones). By default, all values are allowed.

Field is greater than number X:

|Values |Can be produced|
|----------------------|:-------------:|
|Numbers greater than X||
|Numbers less than X ||
|Null ||
|Strings ||
|Date-times ||
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
package com.scottlogic.deg.generator.fieldspecs.relations;

import com.scottlogic.deg.common.profile.Field;
import com.scottlogic.deg.common.profile.FieldType;
import com.scottlogic.deg.common.profile.Granularity;
import com.scottlogic.deg.generator.fieldspecs.*;
import com.scottlogic.deg.generator.fieldspecs.whitelist.DistributedList;
Expand Down Expand Up @@ -60,7 +61,11 @@ public FieldSpec createModifierFromOtherFieldSpec(FieldSpec otherFieldSpec) {

@Override
public FieldSpec createModifierFromOtherValue(DataBagValue otherFieldGeneratedValue) {
T offsetValue = offsetGranularity.getNext((T) otherFieldGeneratedValue.getValue(), offset);
T value = (T) otherFieldGeneratedValue.getValue();
if (value == null) {
return FieldSpecFactory.fromType(FieldType.DATETIME);
}
T offsetValue = offsetGranularity.getNext(value, offset);
return FieldSpecFactory.fromList(DistributedList.singleton(offsetValue));
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,7 @@ public AtomicConstraint negate() {

@Override
public FieldSpec toFieldSpec() {
return FieldSpecFactory.fromList(DistributedList.singleton(value))
.withNotNull();
return FieldSpecFactory.fromList(DistributedList.singleton(value));
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,29 @@ Feature: The violations mode of the Data Helix app can be run in violations mode
Scenario: Running the generator in violate mode for not equal to is successful (decimal)
Given foo is anything but equal to 8
And foo has type "decimal"
And foo is anything but null
And foo is granular to 1
And we do not violate any granular to constraints
And the generation strategy is full
Then the following data should be generated:
| foo |
| 8 |
| foo |
| 8 |
| null |

Scenario: Running the generator in violate mode where equal to is not violated is successful
Given foo is equal to 8
And foo has type "decimal"
And the generation strategy is full
And we do not violate any equal to constraints
Then the following data should be generated:
| foo |
| 8 |
| foo |
| 8 |
| null |

Scenario: Running the generator in violate mode for multiple constraints with strings is successful
Given the generation strategy is interesting
And foo has type "string"
And foo is anything but null
And foo is anything but equal to "hello"
And the generator can generate at most 10 rows
Then the following data should be included in what is generated:
Expand Down
Loading

0 comments on commit 63fa581

Please sign in to comment.