-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DynamoDB: Fix object (Map) representation to CrateDB #31
Conversation
" data['string_set'] = ['location_1'], data['number_set'] = [0.34, 1.0, 2.0, 3.0]," | ||
" data['binary_set'] = ['U3Vubnk='] WHERE data['device'] = 'foo' AND data['timestamp'] = '2024-07-12T01:17:42';" | ||
" data['binary_set'] = ['U3Vubnk='], data['somemap'] = '{\"test\": 1.0, \"test2\": 2.0}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check this pls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there. It will produce this outcome when applied:
create table foo (data OBJECT(DYNAMIC));
insert into foo (data) values ('{"items": ["foo", "bar"]}');
update foo set data['somemap'] = '{\"test\": 1.0, \"test2\": 2.0}';
cr> select * from foo;
+---------------------------------------------------------------------------------+
| data |
+---------------------------------------------------------------------------------+
| {"items": ["foo", "bar"], "somemap": "{\\\"test\\\": 1.0, \\\"test2\\\": 2.0}"} |
+---------------------------------------------------------------------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relevant routines certainly need better software tests, probably including integration tests, so that we get a better handle on those details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked this by just printing
>>> print(DynamoCDCTranslatorCrateDB(table_name="foo").to_sql(MSG_MODIFY_NESTED))
UPDATE "foo" SET data['tags'] = ['foo', 'bar'], data['empty_map'] = '{}', data['empty_list'] = [], data['string_set'] = ['location_1'], data['number_set'] = [0.34, 1.0, 2.0, 3.0], data['binary_set'] = ['U3Vubnk='], data['somemap'] = '{"test": 1.0, "test2": 2.0}' WHERE data['device'] = 'foo' AND data['timestamp'] = '2024-07-12T01:17:42';
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I just exercised it on the real thing using crash
, and the outcome is that, when the UPDATE statement is responsible for introducing a new ex. top-level field, in this case containing an object/map, for example, the procedure goes south because it will get interpreted as TEXT when not explicitly using the ::OBJECT
cast.
Not using casts 💥
cr> create table foo (data OBJECT(DYNAMIC));
CREATE OK, 1 row affected (0.163 sec)
cr> insert into foo (data) values ('{"items": ["foo", "bar"]}');
INSERT OK, 1 row affected (0.043 sec)
cr> update foo set data['a_map'] = '{"test": 43.0, "test2": 2.0}';
UPDATE OK, 1 row affected (0.047 sec)
cr> select * from foo;
+------------------------------------------------------------------------+
| data |
+------------------------------------------------------------------------+
| {"a_map": "{\"test\": 43.0, \"test2\": 2.0}", "items": ["foo", "bar"]} |
+------------------------------------------------------------------------+
Using casts 💯
cr> create table foo (data OBJECT(DYNAMIC));
CREATE OK, 1 row affected (0.155 sec)
cr> insert into foo (data) values ('{"items": ["foo", "bar"]}');
INSERT OK, 1 row affected (0.039 sec)
cr> update foo set data['a_map'] = '{"test": 43.0, "test2": 2.0}'::OBJECT;
UPDATE OK, 1 row affected (0.042 sec)
cr> select * from foo;
+------------------------------------------------------------------+
| data |
+------------------------------------------------------------------+
| {"a_map": {"test": 43.0, "test2": 2.0}, "items": ["foo", "bar"]} |
+------------------------------------------------------------------+
Coming from this evaluation, I guess we need to explicitly use casts for relevant special data types, and can't use the auto-detection capabilities of CrateDB, because it doesn't have any data to apply heuristics to, in order to come to the right conclusion.
Please educate me if you think I am on the wrong track to improve the situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it looks alright, wrapping everything into a data object is less of a silver bullet than I thought, maybe we should just wrap every map
with a ::OBJECT
and call it a day? wdt @hlcianfagna ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, thank you for confirming. Yet, it will be more complicated / not possible with ARRAY
types, yet. I was able to poke @seut about it, and it looks like CrateDB needs a power-up for those.
Review after the merge please |
Summary of the changes / Why this is an improvement
Fixes #32
Checklist