Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in recovering invalid lines in JSONL inputs #17098
Fix bug in recovering invalid lines in JSONL inputs #17098
Changes from 6 commits
9ff3129
624743b
bcecb25
f9b7e08
55c13a0
9d2a2f0
3d0a51d
911e065
ab7659b
0ef5108
1dffbf0
293521f
ca8ee32
ebc5275
679833b
b192fd2
31d5cab
7c3e0f0
35b7177
9370dc5
6d87031
b9005ae
4382ef8
f75d8ee
bb9584e
6ad06ca
424f90f
8b48297
f651087
dfba4cd
eb82450
d3193e3
18f1a6e
96dce9d
f8c5de3
c0d0b3e
234c19d
77b2f99
2e37ed4
3784be9
f351242
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hardcoded to
\n
, should it be delimiter mentioned in json reader options instead?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Qn: Does this force reader to do extra copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I've changed the hardcoded
\n
to the reader options delimiter. Also fixed it in a few other places.Yes, we now have two extra copies between host and device each of size 1 byte. We also perform a stream sync between the copies. I'll run the JSON benchmarks to see what the impact of this change is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we merge #17028, we need to check the last character in the buffer only when both
nullify_empty_rows
andrecover_with_null
are enabled. Otherwise, I think we can always add a delimiter since empty rows are anyway ignored.