You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Some regular expression patterns are invalid in Java, and throw an exception on the CPU but run with no exceptions on the GPU. This is due to the inconsistencies in the regexp parsers in the different systems (in many cases Java being excessively strict).
Steps/Code to reproduce bug
PySpark reproduce:
df = spark.createDataFrame(spark.sparkContext.parallelize([["aaaa"]]), "a string")
df.selectExpr("regexp_replace(a, 'a{', 'bb') as result").show()
When running on CPU, you get a java.util.regex.PatternSyntaxException. On the GPU, this will run without an exception
Suggested fix
I think we should run Pattern.compile(...) on any regular expressions before they hit the transpiler (and even before they hit optimized versions as well) to have consistent behavior between CPU and GPU. That way the same exception will be thrown when the SQL is evaluated.
The text was updated successfully, but these errors were encountered:
NVnavkumar
changed the title
[BUG] Parser regular expressions using JDK to make error behavior more consistent between CPU and GPU
[BUG] Parse regular expressions using JDK to make error behavior more consistent between CPU and GPU
Oct 28, 2024
Describe the bug
Some regular expression patterns are invalid in Java, and throw an exception on the CPU but run with no exceptions on the GPU. This is due to the inconsistencies in the regexp parsers in the different systems (in many cases Java being excessively strict).
Steps/Code to reproduce bug
PySpark reproduce:
When running on CPU, you get a
java.util.regex.PatternSyntaxException
. On the GPU, this will run without an exceptionSuggested fix
I think we should run
Pattern.compile(...)
on any regular expressions before they hit the transpiler (and even before they hit optimized versions as well) to have consistent behavior between CPU and GPU. That way the same exception will be thrown when the SQL is evaluated.The text was updated successfully, but these errors were encountered: