Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGEX_LIKE and REGEX_MATCH don't support LargeUtf8 type #12664

Closed
goldmedal opened this issue Sep 28, 2024 · 3 comments · Fixed by #12690
Closed

REGEX_LIKE and REGEX_MATCH don't support LargeUtf8 type #12664

goldmedal opened this issue Sep 28, 2024 · 3 comments · Fixed by #12690
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@goldmedal
Copy link
Contributor

Describe the bug

While working on #12415, I found REGEX_LIKE and REGEX_MATCH don't support LargeUtf8 type.

To Reproduce

It can be reproduced by the following SQL

> select regexp_like(arrow_cast('abcdef', 'LargeUtf8'), 'bc');
Internal error: could not cast value to arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i64>>.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

> select regexp_match(arrow_cast('abcdef', 'LargeUtf8'), 'bc');
Internal error: could not cast value to arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i64>>.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Expected behavior

They should work well like Utf8

> select regexp_like(arrow_cast('abcdef', 'Utf8'), 'bc');
+-----------------------------------------------------------------+
| regexp_like(arrow_cast(Utf8("abcdef"),Utf8("Utf8")),Utf8("bc")) |
+-----------------------------------------------------------------+
| true                                                            |
+-----------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.010 seconds.

> select regexp_match(arrow_cast('abcdef', 'Utf8'), 'bc');
+------------------------------------------------------------------+
| regexp_match(arrow_cast(Utf8("abcdef"),Utf8("Utf8")),Utf8("bc")) |
+------------------------------------------------------------------+
| [bc]                                                             |
+------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.016 seconds.

Additional context

No response

@goldmedal goldmedal added the bug Something isn't working label Sep 28, 2024
@Omega359
Copy link
Contributor

The issue is not the first element in the array - it's the second. The code expects that the elements in the array are of the same type. For example, in the regexp_like fn:

let values = as_generic_string_array::<T>(&args[0])?;
let regex = as_generic_string_array::<T>(&args[1])?;

This should be a fairly simple fix with a few additions to the appropriate .slt file to prove the fix.

@Omega359
Copy link
Contributor

This would be a good first issue.

@Weijun-H Weijun-H added the good first issue Good for newcomers label Sep 30, 2024
@blaginin
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants