Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-465: Clarify backward-compatibility rules on LIST type #466

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 49 additions & 9 deletions LogicalTypes.md
Original file line number Diff line number Diff line change
Expand Up @@ -609,6 +609,17 @@ that is neither contained by a `LIST`- or `MAP`-annotated group nor annotated
by `LIST` or `MAP` should be interpreted as a required list of required
elements where the element type is the type of the field.

```
// List<Integer> (non-null list, non-null elements)
repeated int32 num;

// List<Tuple<Integer, String>> (non-null list, non-null elements)
repeated group my_list {
required int32 num;
optional binary str (STRING);
}
```

Implementations should use either `LIST` and `MAP` annotations _or_ unannotated
repeated fields, but not both. When using the annotations, no unannotated
repeated types are allowed.
Expand Down Expand Up @@ -670,6 +681,8 @@ optional group array_of_arrays (LIST) {

#### Backward-compatibility rules

##### 3-level structure with different field names

It is required that the repeated group of elements is named `list` and that
its element field is named `element`. However, these names may not be used in
existing data and should not be enforced as errors when reading. For example,
Expand All @@ -684,49 +697,76 @@ optional group my_list (LIST) {
}
```

Some existing data does not include the inner element layer. For
backward-compatibility, the type of elements in `LIST`-annotated structures
should always be determined by the following rules:
##### 2-level structure

Some existing data does not include the inner element layer, meaning that `LIST`
annotates a 2-level structure. In contrast to 3-level structure, the repetition
of 2-level structure can be `optional`, `required`, or `repeated`.
Copy link
Contributor

@rdblue rdblue Nov 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this confusing. Why does it need to change?

The problem isn't when there is a 2-level structure like the one in the new example. The problem is when we need to decide whether a structure is a 2-level or 3-level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just happen to see that this section is all about 2-level list so I have added a title to make it explicit.

The problem is when we need to decide whether a structure is a 2-level or 3-level.

The change aims to exhaust all possible forms of list structures and we can just apply these rules in order to deduce a list type:

  • regular 3-level list
  • 3-level list (w/ different names)
  • 2-level list
  • 1-level list (w/o list annotation)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decide to keep the subtitles as they are pretty clear. It is the exact order to apply when deducing a list type. Please let me know if you have different opinions and I'm happy to change. @rdblue @gszadovszky @emkornfield


```
<list-repetition> group <name> (LIST) {
repeated <element-type> <element-name>;
}
```

For backward-compatibility, the type of elements in `LIST`-annotated 2-level
structures should always be determined by the following rules:

1. If the repeated field is not a group, then its type is the element type and
elements are required.
wgtmac marked this conversation as resolved.
Show resolved Hide resolved
2. If the repeated field is a group with multiple fields, then its type is the
element type and elements are required.
3. If the repeated field is a group with one field and is named either `array`
3. If the repeated field is a group with a `repeated` field, then the repeated
field is the element type because the type cannot be a 3-level list.
4. If the repeated field is a group with one field and is named either `array`
or uses the `LIST`-annotated group's name with `_tuple` appended then the
repeated type is the element type and elements are required.
4. Otherwise, the repeated field's type is the element type with the repeated
5. Otherwise, the repeated field's type is the element type with the repeated
field's repetition.

Examples that can be interpreted using these rules:

```
// List<Integer> (nullable list, non-null elements)
// Rule 1: List<Integer> (nullable list, non-null elements)
optional group my_list (LIST) {
repeated int32 element;
}

// List<Tuple<String, Integer>> (nullable list, non-null elements)
// Rule 2: List<Tuple<String, Integer>> (nullable list, non-null elements)
optional group my_list (LIST) {
repeated group element {
required binary str (STRING);
required int32 num;
};
}

// List<OneTuple<String>> (nullable list, non-null elements)
// Rule 3: List<List<Integer>> (nullable outer list, non-null elements)
optional group my_list (LIST) {
repeated group array (LIST) {
repeated int32 array;
};
}

// Rule 4: List<OneTuple<String>> (nullable list, non-null elements)
optional group my_list (LIST) {
repeated group array {
required binary str (STRING);
};
}

// List<OneTuple<String>> (nullable list, non-null elements)
// Rule 4: List<OneTuple<String>> (nullable list, non-null elements)
optional group my_list (LIST) {
repeated group my_list_tuple {
required binary str (STRING);
};
}

wgtmac marked this conversation as resolved.
Show resolved Hide resolved
// Rule 5: List<OneTuple<List<Integer>>> (nullable outer list, non-null elements)
optional group my_list (LIST) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the case where annotated 2-level list is mixed with unannotated repeated field (1-level list). Is this allowed? If not, I can remove the example. Then, is there any concrete example for the last rule (a.k.a. rule 5 above)? @gszadovszky @rdblue

repeated group foo {
repeated int32 bar;
};
}
```

### Maps
Expand Down