-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Counterintuitive Behaviors of contains
Filter Operator over Arrays in PostgreSQL Backend
#6618
Comments
When implementing this operator on the SQLite backend, I gained a clearer understanding of its semantics, especially in the context of nested arrays. It's overly simplistic to treat this operation as a purely mathematical "is-subset" or "contains" comparator. Instead, if we conceptualize JSON objects as trees, this For instance, consider the JSON arrays
It becomes evident that In contrast, if you examine
You’ll notice it lacks an intermediate "ARRAY" node, so This becomes even more apparent in cases like This tree-based perspective also applies to JSON objects (dictionaries in Python). For instance: Here, Additionally, this tree-based interpretation clarifies why the order of elements within an array doesn’t matter: the structure is what’s being compared, not the sequence. Another noteworthy property of this containment operator is that duplicate elements are ignored. By combining these two characteristics, the behavior of the operator becomes more intuitive and easier to understand. cc @GeigerJ2 |
Thanks a lot for this deep-dive and taking the time to write down the explanation here, @rabbull! That actually makes a lot of sense. Feel free to add a pointer to this issue to the discourse thread you linked. |
Describe the bug
The
contains
operator on the PostgreSQL backend lacks comprehensive documentation. To better understand its exact behavior, I conducted additional tests in #6617. However, some of the observ/ed behaviors are counterintuitive, and these are detailed below:
1. Non-Existent
attr_key
In a test where the
attributes
column does not contain a key namedoops
, the query executes successfully. However, the results are confusing: neither the affirmation nor the negation of thecontains
operation matches the entry. This behavior is unexpected and counterintuitive. What best fits my expectation is to fail loudly, e.g. to raise an Exception.2. Nested Array
This is a known issue discussed in a previous thread on discourse.
When testing with nested arrays, the
contains
operation unexpectedly matches entries even when the contained elements do not strictly align with the expected structure. For example, a query attempting to match[2]
against an array[[1, 2], [3]]
may return a match, even though[2]
is not directly an element of the array.Note that this actually complies with PostgreSQL's native JSONB containment semantics:
However, this behavior contracts with the intution quite a lot, and consequently makes the abstraction hard to be understood well.
In addition, it should also be nice to mention in the documentation that
contains
doesn't care the order of arrays. See example below:Steps to reproduce
See above.
Expected behavior
See above.
Your environment
Other relevant software versions, e.g. Postres & RabbitMQ
Additional context
This might be issues of SQLAlchemy and need further investigation.
The text was updated successfully, but these errors were encountered: