[FEA] CSV option to strip trailing white space after a quoted field. #13892
Labels
0 - Backlog
In queue waiting for assignment
cuIO
cuIO issue
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Spark
Functionality that helps Spark RAPIDS
Milestone
Is your feature request related to a problem? Please describe.
Spark by default will strip out training white space that appears after a quoted value.
So if I have a CSV file with something like
Spark will strip out the trailing white space and treat it just like it was
CUDF does not do this and instead when it sees it treats the entry as if they were not quoted at all.
CUDF produces
"A"
as the value returned, but Spark produces justA
.Describe the solution you'd like
I would love a config flag that would let us do this automatically.
Describe alternatives you've considered
We could also do something similar to what is happening with JSON where we can ask for the string data to be returned with quotes intact, so we could handle cleaning it up ourselves. But I am not sure how that might interact with escaping so Ideally we would just go with the first option.
The text was updated successfully, but these errors were encountered: