Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long string optimization for string column parsing in JSON reader #13803

Merged
merged 63 commits into from
Sep 20, 2023
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
91742af
warp per string parsing of string columns (unicode)
karthikeyann Aug 2, 2023
51e70e2
remove dependency of data_casting.cuh in write_json.cu
karthikeyann Aug 2, 2023
318b4a3
cleanup
karthikeyann Aug 2, 2023
363d5ab
try load balancing with global counter for string index
karthikeyann Aug 7, 2023
58e0d6c
fix intra-warp divergence issue with cub::WarpScan stuck
karthikeyann Aug 7, 2023
c0edf8f
remove unnecessary WarpReduce, reduce shmem usage
karthikeyann Aug 7, 2023
086dfa9
cleanup comments, unused code
karthikeyann Aug 7, 2023
0aa2c0e
add block per string algorithm
karthikeyann Aug 11, 2023
57ea056
cleanup, kernel name
karthikeyann Aug 11, 2023
7e4cfd2
add BLOCK_SIZE to block kernel
karthikeyann Aug 17, 2023
6622460
clean up, add constants
karthikeyann Aug 17, 2023
efe7898
add long string json test
karthikeyann Aug 17, 2023
589e0a3
remove debug prints
karthikeyann Aug 17, 2023
4f8e413
Merge branch 'branch-23.10' of github.com:rapidsai/cudf into enh-json…
karthikeyann Aug 17, 2023
d3dc8cf
comment
karthikeyann Aug 17, 2023
e17589e
style fix, add constants
karthikeyann Aug 17, 2023
21fe6c3
Merge branch 'branch-23.10' of github.com:rapidsai/cudf into enh-json…
karthikeyann Aug 19, 2023
631528a
unified kernel for warp and block
karthikeyann Aug 25, 2023
6fe5afa
remove duplicate block kernel, cleanup names
karthikeyann Aug 25, 2023
3f613c8
Merge branch 'branch-23.10' of github.com:rapidsai/cudf into enh-json…
karthikeyann Aug 25, 2023
d3a35b1
address review comments
karthikeyann Aug 27, 2023
658d8ba
cleanup infer_data_type signature
karthikeyann Aug 28, 2023
3e9a88c
Rename type_inference.cuh to .cu file
karthikeyann Aug 28, 2023
1838341
Cleanup parse_data signature
karthikeyann Aug 28, 2023
efe9712
rename data_casting.cuh to .cu
karthikeyann Aug 28, 2023
86c4519
Merge branch 'branch-23.10' of github.com:rapidsai/cudf into enh-json…
karthikeyann Aug 28, 2023
1d86996
move get_escaped_char to parsing_utils.cuh
karthikeyann Aug 29, 2023
e11452f
last backslash errored bug fix
karthikeyann Aug 29, 2023
21e56e5
Merge branch 'branch-23.10' into enh-json_string_perf
vuule Aug 30, 2023
9426dda
address review comments, update docs
karthikeyann Aug 30, 2023
b481d7c
add complex test cases
karthikeyann Aug 31, 2023
a5acda6
add is escaping backslash lookback
karthikeyann Aug 31, 2023
b661d95
Merge branch 'branch-23.10' of github.com:rapidsai/cudf into enh-json…
karthikeyann Aug 31, 2023
4a3941f
call thread kernel for small size, adjust sizes
karthikeyann Aug 31, 2023
d227dad
address review comments
karthikeyann Sep 1, 2023
c9802c0
Merge branch 'branch-23.10' of github.com:rapidsai/cudf into enh-json…
karthikeyann Sep 1, 2023
bcade0f
address review comments, fix 2 data hazards
karthikeyann Sep 4, 2023
431e1ec
Merge branch 'branch-23.10' of github.com:rapidsai/cudf into enh-json…
karthikeyann Sep 4, 2023
4d1e048
update comments
karthikeyann Sep 4, 2023
896141d
using bitfields for state_table, no local mem
karthikeyann Sep 5, 2023
38f8253
Merge branch 'branch-23.10' into enh-json_string_perf
vuule Sep 8, 2023
7095aab
review comments syncthreads()
karthikeyann Sep 11, 2023
0d723e0
address review comments syncthreads()
karthikeyann Sep 11, 2023
3dce8d9
fix consts, zero size column, roundup
karthikeyann Sep 11, 2023
8de7d76
optimzie single character write case, also fixes direct unicode bug
karthikeyann Sep 11, 2023
cb0e0ba
add unit test JsonReaderTest.ErrorStrings
karthikeyann Sep 11, 2023
7f3a534
Merge branch 'enh-json_string_perf' of github.com:karthikeyann/cudf i…
karthikeyann Sep 11, 2023
7ee241a
Merge branch 'branch-23.10' into enh-json_string_perf
karthikeyann Sep 11, 2023
7460997
Merge branches 'enh-json_string_perf' and 'enh-json_string_perf' of g…
karthikeyann Sep 11, 2023
4b46027
add comment
karthikeyann Sep 11, 2023
51390fd
update comments
karthikeyann Sep 11, 2023
d7bb5ac
adjust kernel string limits
karthikeyann Sep 11, 2023
8007ec3
Merge branch 'branch-23.10' into enh-json_string_perf
karthikeyann Sep 11, 2023
d0c8612
reorg json type test code
karthikeyann Sep 13, 2023
56d7fb6
add error cases for parse_data
karthikeyann Sep 13, 2023
d088e8e
address review comments (vuule)
karthikeyann Sep 13, 2023
eabb7a8
Merge branch 'branch-23.10' of github.com:rapidsai/cudf into enh-json…
karthikeyann Sep 13, 2023
79b4f38
fix review comments, remove nvtx ranges
karthikeyann Sep 13, 2023
403a374
fix unit test cases nullability
karthikeyann Sep 13, 2023
af334fc
Merge branch 'branch-23.10' into enh-json_string_perf
vuule Sep 16, 2023
72d23fb
address review comments, split code for string type
karthikeyann Sep 19, 2023
c8e1f69
Merge branch 'branch-23.10' of github.com:rapidsai/cudf into enh-json…
karthikeyann Sep 19, 2023
d0a5e23
add comments, style fix
karthikeyann Sep 19, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -413,11 +413,13 @@ add_library(
src/io/utilities/arrow_io_source.cpp
src/io/utilities/column_buffer.cpp
src/io/utilities/config_utils.cpp
src/io/utilities/data_casting.cu
src/io/utilities/data_sink.cpp
src/io/utilities/datasource.cpp
src/io/utilities/file_io_utilities.cpp
src/io/utilities/parsing_utils.cu
src/io/utilities/row_selection.cpp
src/io/utilities/type_inference.cu
src/io/utilities/trie.cu
src/jit/cache.cpp
src/jit/parser.cpp
Expand Down
Loading