Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB: Fix serializing OBJECT and ARRAY representations to CrateDB #34

Merged
merged 2 commits into from
Aug 23, 2024

Conversation

amotl
Copy link
Member

@amotl amotl commented Aug 23, 2024

About

Another attempt to get relaying container type values right, this time for both OBJECTs and ARRAYs, at least a basic variant thereof, in order to unblock people needing it.

Details

The problem with ARRAYs is that they can't be relayed using JSON syntax, supported by a corresponding CAST. Instead, we need to use the native notation of CrateDB, which works without much ado on primitive inner types.

When serializing ARRAYs including OBJECT inner types, we need to actively convert to CrateDB's native syntax, which is close to Python's dict(foo='bar') syntax notation.

References

@amotl amotl requested review from seut and surister August 23, 2024 12:31
@amotl amotl force-pushed the dynamodb-fix-arrays branch from 89f000e to 17e03d1 Compare August 23, 2024 12:33
@amotl amotl marked this pull request as ready for review August 23, 2024 12:35
Comment on lines 180 to 193
# TODO: ARRAY types do not support JSON syntax yet.
# Let's relay them 1:1, which works for primitive inner types,
# but complex ones need special treatment, like representing
# OBJECTs in CrateDB-native syntax.
# FIXME: Find a way to use a custom JSONEncoder for that, in order to
# fully support also nested elements of those.
# https://github.com/crate/commons-codec/issues/33
elif isinstance(key_value, list):
if key_value:
if isinstance(key_value[0], dict):
items = []
for item in key_value:
items.append("{" + ", ".join(f"{key}='{value}'" for key, value in item.items()) + "}")
key_value = "[" + ",".join(items) + "]"
Copy link
Member Author

@amotl amotl Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling it specially, [...] close to Python's dict(foo='bar') syntax notation.

This does not apply any special treatments to the OBJECT keys yet, and I did not perform any further orientation flights yet. For example, what happens if the key is a reserved keyword, or contains special characters like dashes -?

Copy link
Member Author

@amotl amotl Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on your spirits, we may want to defer further special handling to a subsequent iteration, and first check if the use case at hand already gets satisfied well.

if isinstance(key_value, dict):
key_value = repr(json.dumps(key_value))
elif isinstance(key_value, dict):
# TODO: Does it also need escaping of inner TEXT values, like the above?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single quotes must always be escaped as they are used for any text literals at CrateDB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. So that serialization will need further improvements to treat OBJECT values the same like above.

Comment on lines 189 to 193
if isinstance(key_value[0], dict):
items = []
for item in key_value:
items.append("{" + ", ".join(f"{key}='{value}'" for key, value in item.items()) + "}")
key_value = "[" + ",".join(items) + "]"
Copy link
Member

@seut seut Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is about the other cases where the array element isn't a dict?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They absolutely need to be taken care of, right. Currently, as far as I can see through knowing about data payloads at hand, it might not be of current concerns yet.

@seut
Copy link
Member

seut commented Aug 23, 2024

Looks like the special array(object) case is already support by casting the json string to object[]. See crate/crate#14666 and crate/crate#14840.

@amotl
Copy link
Member Author

amotl commented Aug 23, 2024

Looks like the special array(object) case is already support by casting the json string to object[].

This is absolutely right on the spot, and seems to work well. Thank you so much!

data['list_of_objects'] = '[{"foo": "bar"}, {"baz": "qux"}]'::OBJECT[]
cr> UPDATE "foo" SET data['tags'] = ['foo', 'bar'], data['empty_map'] = '{}'::OBJECT, data['empty_list'] = [], data['string_set'] = ['location_1'], data['number_set'] = [0.34, 1.0, 2.0, 3.0],
     data['binary_set'] = ['U3Vubnk='], data['somemap'] = '{"test": 1.0, "test2": 2.0}'::OBJECT, data['list_of_objects'] = '[{"foo": "bar"}, {"baz": "qux"}]'::OBJECT[] WHERE data['device'] =
    'foo';
UPDATE OK, 1 row affected (0.039 sec)
cr> select * from foo;
| {"binary_set": ["U3Vubnk="], "device": "foo", "empty_list": [], "empty_map": {}, "items": ["foo", "bar"], "list_of_objects": [{"foo": "bar"}, {"baz": "qux"}], "number_set": [0.34, 1.0, 2.0, 3.0], "somemap": {"test": 1.0, "test2": 2.0}, "string_set": ["location_1"], "tags": ["foo", "bar"]} |

@amotl
Copy link
Member Author

amotl commented Aug 23, 2024

I've amended the patch correspondingly.

@amotl amotl requested a review from seut August 23, 2024 13:12
@seut
Copy link
Member

seut commented Aug 23, 2024

Btw. any reason not to use parameter binding instead of serializing values into the SQL? This would avoid the need of any special escaping I think and would be much safer..

@amotl
Copy link
Member Author

amotl commented Aug 23, 2024

Btw. any reason not to use parameter binding instead of serializing values into the SQL? This would avoid the need of any special escaping I think and would be much safer.

Absolutely not. It is just my fault that I didn't use it from the very beginning. A subsequent iteration certainly will bring it to the table, because it is absolutely the right choice not needing to care about those details manually. Thanks!

@amotl amotl merged commit f382455 into main Aug 23, 2024
15 checks passed
@amotl amotl deleted the dynamodb-fix-arrays branch August 23, 2024 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants