Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB: Fix object (Map) representation to CrateDB #31

Merged
merged 1 commit into from
Aug 22, 2024

Conversation

surister
Copy link
Contributor

@surister surister commented Aug 22, 2024

Summary of the changes / Why this is an improvement

Fixes #32

Checklist

  • Link to issue this PR refers to (if applicable):
  • CLA is signed

" data['string_set'] = ['location_1'], data['number_set'] = [0.34, 1.0, 2.0, 3.0],"
" data['binary_set'] = ['U3Vubnk='] WHERE data['device'] = 'foo' AND data['timestamp'] = '2024-07-12T01:17:42';"
" data['binary_set'] = ['U3Vubnk='], data['somemap'] = '{\"test\": 1.0, \"test2\": 2.0}'"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this pls

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there. It will produce this outcome when applied:

create table foo (data OBJECT(DYNAMIC));
insert into foo (data) values ('{"items": ["foo", "bar"]}');
update foo set data['somemap'] = '{\"test\": 1.0, \"test2\": 2.0}';
cr> select * from foo;
+---------------------------------------------------------------------------------+
| data                                                                            |
+---------------------------------------------------------------------------------+
| {"items": ["foo", "bar"], "somemap": "{\\\"test\\\": 1.0, \\\"test2\\\": 2.0}"} |
+---------------------------------------------------------------------------------+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relevant routines certainly need better software tests, probably including integration tests, so that we get a better handle on those details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, the scaping \" are only in the test string so it can be written in Python, the actual produced query is
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked this by just printing

>>> print(DynamoCDCTranslatorCrateDB(table_name="foo").to_sql(MSG_MODIFY_NESTED))
UPDATE "foo" SET data['tags'] = ['foo', 'bar'], data['empty_map'] = '{}', data['empty_list'] = [], data['string_set'] = ['location_1'], data['number_set'] = [0.34, 1.0, 2.0, 3.0], data['binary_set'] = ['U3Vubnk='], data['somemap'] = '{"test": 1.0, "test2": 2.0}' WHERE data['device'] = 'foo' AND data['timestamp'] = '2024-07-12T01:17:42';

Copy link
Member

@amotl amotl Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I just exercised it on the real thing using crash, and the outcome is that, when the UPDATE statement is responsible for introducing a new ex. top-level field, in this case containing an object/map, for example, the procedure goes south because it will get interpreted as TEXT when not explicitly using the ::OBJECT cast.

Not using casts 💥

cr> create table foo (data OBJECT(DYNAMIC));
CREATE OK, 1 row affected (0.163 sec)
cr> insert into foo (data) values ('{"items": ["foo", "bar"]}');
INSERT OK, 1 row affected (0.043 sec)
cr> update foo set data['a_map'] = '{"test": 43.0, "test2": 2.0}';
UPDATE OK, 1 row affected (0.047 sec)
cr> select * from foo;
+------------------------------------------------------------------------+
| data                                                                   |
+------------------------------------------------------------------------+
| {"a_map": "{\"test\": 43.0, \"test2\": 2.0}", "items": ["foo", "bar"]} |
+------------------------------------------------------------------------+

Using casts 💯

cr> create table foo (data OBJECT(DYNAMIC));
CREATE OK, 1 row affected (0.155 sec)
cr> insert into foo (data) values ('{"items": ["foo", "bar"]}');
INSERT OK, 1 row affected (0.039 sec)
cr> update foo set data['a_map'] = '{"test": 43.0, "test2": 2.0}'::OBJECT;
UPDATE OK, 1 row affected (0.042 sec)
cr> select * from foo;
+------------------------------------------------------------------+
| data                                                             |
+------------------------------------------------------------------+
| {"a_map": {"test": 43.0, "test2": 2.0}, "items": ["foo", "bar"]} |
+------------------------------------------------------------------+

Coming from this evaluation, I guess we need to explicitly use casts for relevant special data types, and can't use the auto-detection capabilities of CrateDB, because it doesn't have any data to apply heuristics to, in order to come to the right conclusion.

Please educate me if you think I am on the wrong track to improve the situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it looks alright, wrapping everything into a data object is less of a silver bullet than I thought, maybe we should just wrap every map with a ::OBJECT and call it a day? wdt @hlcianfagna ?

Copy link
Member

@amotl amotl Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thank you for confirming. Yet, it will be more complicated / not possible with ARRAY types, yet. I was able to poke @seut about it, and it looks like CrateDB needs a power-up for those.

@surister
Copy link
Contributor Author

Review after the merge please

@surister surister merged commit 5f1f85b into main Aug 22, 2024
15 checks passed
@surister surister deleted the dynamodb-fix-map branch August 22, 2024 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DynamoDB: Handle maps correctly
2 participants