Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB: Support list with items of different types. #28

Closed
surister opened this issue Aug 21, 2024 · 10 comments
Closed

DynamoDB: Support list with items of different types. #28

surister opened this issue Aug 21, 2024 · 10 comments

Comments

@surister
Copy link
Contributor

Example:

[
  {'a': 1}, 2, "Three"
]
@surister surister changed the title DynamoDB: Support list with objects of different types. DynamoDB: Support list with items of different types. Aug 21, 2024
@surister
Copy link
Contributor Author

That's actually ilegal in CrateDB, but ok in DynamoDB, we should probably detect it and handle it.

@amotl
Copy link
Member

amotl commented Aug 23, 2024

Yeah, CrateDB needs to specify the inner type, like ::ARRAY(TEXT). So, it is probably not possible to store arrays/lists of varied types.

@amotl
Copy link
Member

amotl commented Aug 28, 2024

we should probably detect it and handle it

Good idea. While storing it 1:1 is probably not possible, because CrateDB can't handle it, we need a trick/workaround. Do you have any suggestions? Maybe one of you already applied any tricks in similar situations, or have another idea?

/cc @hlcianfagna, @wierdvanderhaar, @hammerhead, @proddata, @matriv

@surister
Copy link
Contributor Author

How about we detect it, map it to a OBJECT DYNAMIC and insert it to the object with keys being the types?

l = [
  {'a': 1}, 2, "Three"
]

r = {}

for el in l:
    if (_type := type(el).__name__) not in r:
        r[_type] = []
    r[_type].append(el)

print(r)
{
    'dict': [{'a': 1}],
    'int': [2],
    'str': ['Three']
}

@proddata
Copy link
Member

You can indirectly store it

cr> CREATE TABLE t01 (o OBJECT(IGNORED));
cr> INSERT INTO t01 (o) VALUES ($${"oo":[{"a": 1}, 2, "Three"]}$$);

cr> SELECT * FROM t01;
+--------------------------------+
| o                              |
+--------------------------------+
| {"oo": [{"a": 1}, 2, "Three"]} |
+--------------------------------+

cr> SELECT o['oo'][1] FROM t01;
+------------+
| o['oo'][1] |
+------------+
| {"a": 1}   |
+------------+

cr> SELECT o['oo'][2] FROM t01;
+------------+
| o['oo'][2] |
+------------+
|          2 |
+------------+

@matriv
Copy link

matriv commented Aug 29, 2024

I think the OBJECT(IGNORED) is the only way, as @proddata suggested, but then nothing is indexed, so maybe this functionality can be enabled/disabled with a flag, and it should be clearly documented how such rows are indexed, and that searching on them is not possible.

@proddata
Copy link
Member

proddata commented Aug 29, 2024

I think the option is to store what is possible in a typed field and what is not in a separate column.
Ideally CrateDB could handle this e.g. like DuckDB and map types where possible and otherwise use generic json type

SELECT json_structure('[{"x":1,"y":1},{"x":2,"y":[1,2,3]}]') a;
┌──────────────────────────────┐
│              a               │
│             json             │
├──────────────────────────────┤
│ [{"x":"UBIGINT","y":"JSON"}] │
└──────────────────────────────┘

@amotl
Copy link
Member

amotl commented Aug 30, 2024

Thank you for your excellent suggestions 💯. I followed @proddata's advise:

I think the option is to store what is possible in a typed field and what is not in a separate column.

Please add your voice about general idea, naming things, or anything else which comes to mind. Thanks!

@amotl
Copy link
Member

amotl commented Sep 2, 2024

The patch referenced above, GH-39, will be included into the upcoming release, in order to improve the situation. The code is now also validated on behalf of integration tests with CrateDB, after GH-42 has been added.

To recap and summarize, the gist of the implementation is to use two distinct columns to store typed vs. untyped data. Currently, column names are data vs. aux. Please advise about better naming things, or if those names should even be made configurable on a subsequent iteration.

# Define name of the column where typed DynamoDB fields will get materialized into.
# This column uses the `OBJECT(DYNAMIC)` data type.
TYPED_COLUMN = "data"
# Define name of the column where untyped DynamoDB fields will get materialized into.
# This column uses the `OBJECT(IGNORED)` data type.
UNTYPED_COLUMN = "aux"

@amotl
Copy link
Member

amotl commented Sep 2, 2024

The improvement has been included into release v0.0.14.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants