JSON spark reader plan for 24.12 #17138

karthikeyann · 2024-10-21T20:32:56Z

These are the planned optimizations and bug fixes for JSON spark reader for 24.12 release.

Memory optimization PR JSON tokenizer memory optimizations #16978
Runtime mitigation issue - Multi-stage FST implementation (Elias, Shruti) [FEA] Faster path for calculating total output symbols in FST #17114
input schema issue/PR (New Feature)
Performance: Preprocessing: nullify empty lines PR add option to nullify empty lines #17028
Bugfix: last invalid json is not error - [BUG] cudf::read_json incorrectly parses invalid JSON string #16999
Bugfix: disable array of arrays for spark - disable array of arrays for recovery with null #17030
Performance: mega kernel - [FEA] Implement merged 'mega' kernel to parse leaf-level columns in JSON reader #16965
[FEA] Improve GpuJsonToStructs performance NVIDIA/spark-rapids#11560 (input schema, and post-processing move columns without copying)
- Convert strings columns output from cudf::read_json to other types NVIDIA/spark-rapids-jni#2510

The text was updated successfully, but these errors were encountered:

karthikeyann added cuIO cuIO issue improvement Improvement / enhancement to an existing function Spark Functionality that helps Spark RAPIDS labels Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON spark reader plan for 24.12 #17138

JSON spark reader plan for 24.12 #17138

karthikeyann commented Oct 21, 2024 •

edited

Loading

JSON spark reader plan for 24.12 #17138

JSON spark reader plan for 24.12 #17138

Comments

karthikeyann commented Oct 21, 2024 • edited Loading

karthikeyann commented Oct 21, 2024 •

edited

Loading