Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB: Add special decoding for varied lists #39

Merged
merged 1 commit into from
Aug 31, 2024
Merged

Conversation

amotl
Copy link
Member

@amotl amotl commented Aug 30, 2024

About

Because CrateDB can't store varied lists into OBJECT(DYNAMIC) columns, let's break them apart and store them into a separate OBJECT(IGNORED) column.

References

/cc @matriv, @proddata

@amotl amotl requested review from proddata and surister August 30, 2024 07:33
Comment on lines +74 to +81
# Define name of the column where typed DynamoDB fields will get materialized into.
# This column uses the `OBJECT(DYNAMIC)` data type.
TYPED_COLUMN = "data"

# Define name of the column where untyped DynamoDB fields will get materialized into.
# This column uses the `OBJECT(IGNORED)` data type.
UNTYPED_COLUMN = "aux"
Copy link
Member Author

@amotl amotl Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running out of "naming things", please advise if you can come up with better labels. I think data is fine, but maybe we can find a better name for the aux column together.

The implementation idea is from @proddata, so explicitly tagging him about it hereby.

Comment on lines +20 to +23
def test_decode_typed_untyped():
assert DynamoDBCDCTranslator(table_name="foo").decode_record(
{"foo": {"N": "84.84"}, "bar": {"L": [{"N": "1"}, {"S": "foo"}]}}
) == DualRecord(typed={"foo": 84.84}, untyped={"bar": [1.0, "foo"]})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gist, in outcome, is best viewed here.

@amotl amotl requested review from seut and removed request for proddata August 30, 2024 09:22
@amotl amotl marked this pull request as ready for review August 30, 2024 09:22
Copy link
Member

@seut seut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm otherwise

@@ -71,6 +72,7 @@ def test_to_sql_list_of_objects():
"utm_source": "google",
}
]
}
},
"untyped": {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we add a test including untyped columns?

Copy link
Member Author

@amotl amotl Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right on the spot. Another test case for that is yet missing. Thanks a stack!

A little test case I used to drive my coding procedures is here, but it doesn't do end-to-end including .to_sql() yet, it just exercises .decode_record().

Store them into a separate `OBJECT(IGNORED)` column in CrateDB.
@amotl amotl force-pushed the dynamodb-varied-lists branch from d7a1dd3 to 3a31a06 Compare August 31, 2024 17:29
@amotl amotl merged commit 4ca7b87 into main Aug 31, 2024
15 checks passed
@amotl amotl deleted the dynamodb-varied-lists branch August 31, 2024 18:33
Comment on lines 44 to 65
def test_to_sql_all_types():
assert DynamoDBFullLoadTranslator(table_name="foo").to_sql(RECORD_ALL_TYPES) == SQLOperation(
statement="INSERT INTO foo (data) VALUES (:record);",
statement="INSERT INTO foo (data, aux) VALUES (:typed, :untyped);",
parameters={
"record": {
"typed": {
"id": "5F9E-Fsadd41C-4C92-A8C1-70BF3FFB9266",
"data": {"temperature": 42.42, "humidity": 84.84},
"meta": {"timestamp": "2024-07-12T01:17:42", "device": "foo"},
"string_set": ["location_1"],
"number_set": [0.34, 1.0, 2.0, 3.0],
"binary_set": ["U3Vubnk="],
"somemap": {"test": 1.0, "test2": 2.0},
}
},
"untyped": {
"list_varied": [
{"a": 1.0},
2.0,
"Three",
],
},
},
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test case fragment has been added. /cc @seut

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants