-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect LIKE and ILIKE result for NULL input, `%', and '%%' pattern #12637
Comments
I think if the input is null, the string view should also return null |
Thanks! I updated the expected behavior 👍 |
Concise repro. This should return NULL
@goldmedal can you please change issue title to something like |
%
pattern
Thanks for the suggestion 👍 |
LIKE -- apache/arrow-rs#6662 |
take |
I plan to take care of ILIKE as well, once the above is accepted. |
`expr LIKE '%'` was previously simplified to `true`, but the expression returns `NULL` when `expr` is null. The conversion was conditional on `!is_null(expr)` which means "is not always true, i.e. is not a null literal". This commit adds correct simplification logic. It additionally expands the rule coverage to include string view (Utf8View) and large string (LargeUtf8). This allows writing shared test cases even despite `utf8_view LIKE '%'` returning incorrect results at execution time (tracked by apache#12637). I.e. the simplification masks the bug for cases where pattern is statically known.
For cases when pattern is static (majority), we can partially fix this issue by fixing a bug in the simplifier, and thus avoiding buggy execution path: #13259 |
`expr LIKE '%'` was previously simplified to `true`, but the expression returns `NULL` when `expr` is null. The conversion was conditional on `!is_null(expr)` which means "is not always true, i.e. is not a null literal". This commit adds correct simplification logic. It additionally expands the rule coverage to include string view (Utf8View) and large string (LargeUtf8). This allows writing shared test cases even despite `utf8_view LIKE '%'` returning incorrect results at execution time (tracked by apache#12637). I.e. the simplification masks the bug for cases where pattern is statically known.
`expr LIKE '%'` was previously simplified to `true`, but the expression returns `NULL` when `expr` is null. The conversion was conditional on `!is_null(expr)` which means "is not always true, i.e. is not a null literal". This commit adds correct simplification logic. It additionally expands the rule coverage to include string view (Utf8View) and large string (LargeUtf8). This allows writing shared test cases even despite `utf8_view LIKE '%'` returning incorrect results at execution time (tracked by apache#12637). I.e. the simplification masks the bug for cases where pattern is statically known.
@goldmedal can you maybe edit issue title once again to cover (i will see maybe i can fix repeated percent there too?) |
%
pattern
Interesting, I haven't checked the source code but I tried different numbers of the
It's a nice find 👍 |
`expr LIKE '%'` was previously simplified to `true`, but the expression returns `NULL` when `expr` is null. The conversion was conditional on `!is_null(expr)` which means "is not always true, i.e. is not a null literal". This commit adds correct simplification logic. It additionally expands the rule coverage to include string view (Utf8View) and large string (LargeUtf8). This allows writing shared test cases even despite `utf8_view LIKE '%'` returning incorrect results at execution time (tracked by apache#12637). I.e. the simplification masks the bug for cases where pattern is statically known.
It looks like a missed optimization opportunity. |
* Fix incorrect `... LIKE '%'` simplification `expr LIKE '%'` was previously simplified to `true`, but the expression returns `NULL` when `expr` is null. The conversion was conditional on `!is_null(expr)` which means "is not always true, i.e. is not a null literal". This commit adds correct simplification logic. It additionally expands the rule coverage to include string view (Utf8View) and large string (LargeUtf8). This allows writing shared test cases even despite `utf8_view LIKE '%'` returning incorrect results at execution time (tracked by #12637). I.e. the simplification masks the bug for cases where pattern is statically known. * fixup! Fix incorrect `... LIKE '%'` simplification * fix tests (re review comments)
* Fix incorrect `... LIKE '%'` simplification `expr LIKE '%'` was previously simplified to `true`, but the expression returns `NULL` when `expr` is null. The conversion was conditional on `!is_null(expr)` which means "is not always true, i.e. is not a null literal". This commit adds correct simplification logic. It additionally expands the rule coverage to include string view (Utf8View) and large string (LargeUtf8). This allows writing shared test cases even despite `utf8_view LIKE '%'` returning incorrect results at execution time (tracked by apache#12637). I.e. the simplification masks the bug for cases where pattern is statically known. * fixup! Fix incorrect `... LIKE '%'` simplification * fix tests (re review comments)
…che#13259) * Fix incorrect `... LIKE '%'` simplification `expr LIKE '%'` was previously simplified to `true`, but the expression returns `NULL` when `expr` is null. The conversion was conditional on `!is_null(expr)` which means "is not always true, i.e. is not a null literal". This commit adds correct simplification logic. It additionally expands the rule coverage to include string view (Utf8View) and large string (LargeUtf8). This allows writing shared test cases even despite `utf8_view LIKE '%'` returning incorrect results at execution time (tracked by apache#12637). I.e. the simplification masks the bug for cases where pattern is statically known. * fixup! Fix incorrect `... LIKE '%'` simplification * fix tests (re review comments)
…che#13259) * Fix incorrect `... LIKE '%'` simplification `expr LIKE '%'` was previously simplified to `true`, but the expression returns `NULL` when `expr` is null. The conversion was conditional on `!is_null(expr)` which means "is not always true, i.e. is not a null literal". This commit adds correct simplification logic. It additionally expands the rule coverage to include string view (Utf8View) and large string (LargeUtf8). This allows writing shared test cases even despite `utf8_view LIKE '%'` returning incorrect results at execution time (tracked by apache#12637). I.e. the simplification masks the bug for cases where pattern is statically known. * fixup! Fix incorrect `... LIKE '%'` simplification * fix tests (re review comments)
…che#13259) * Fix incorrect `... LIKE '%'` simplification `expr LIKE '%'` was previously simplified to `true`, but the expression returns `NULL` when `expr` is null. The conversion was conditional on `!is_null(expr)` which means "is not always true, i.e. is not a null literal". This commit adds correct simplification logic. It additionally expands the rule coverage to include string view (Utf8View) and large string (LargeUtf8). This allows writing shared test cases even despite `utf8_view LIKE '%'` returning incorrect results at execution time (tracked by apache#12637). I.e. the simplification masks the bug for cases where pattern is statically known. * fixup! Fix incorrect `... LIKE '%'` simplification * fix tests (re review comments)
done in apache/arrow-rs#6705 should we close this issue when we update arrow-rs dep? |
Marked as waiting on upstream |
* Fix incorrect `... LIKE '%'` simplification `expr LIKE '%'` was previously simplified to `true`, but the expression returns `NULL` when `expr` is null. The conversion was conditional on `!is_null(expr)` which means "is not always true, i.e. is not a null literal". This commit adds correct simplification logic. It additionally expands the rule coverage to include string view (Utf8View) and large string (LargeUtf8). This allows writing shared test cases even despite `utf8_view LIKE '%'` returning incorrect results at execution time (tracked by apache#12637). I.e. the simplification masks the bug for cases where pattern is statically known. * fixup! Fix incorrect `... LIKE '%'` simplification * fix tests (re review comments)
Describe the bug
While working on #12415, I found the
LIKE
andILIKE
behavior differs betweenStringView
and other string types. Given the following data and SQL:When the input value is NULL, string type will return
NULL
but string view will return false. (Something is interesting about the ILIKE operation is different between ASCII-only and UnicodeStringView
🤔 )Some testing for StringView ScalarValue
When the matching pattern contains
%
, it will returnfalse
insteadnull
.Some testing for String ScalarValue (Same as LargeString and DictionaryString)
To Reproduce
Run the SQLs mentioned above.
Expected behavior
I'm not really sure if the behavior of StringView is expected 🤔 but I think their behavior should be consistent.If the input is null, the like and ilike should return null.
Additional context
No response
The text was updated successfully, but these errors were encountered: