Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement ORC chunked reader (#15094)
This implements ORC chunked reader, to support reading ORC such that: * The output is multiple tables instead of once, each of them is issue when calling to `read_chunk()`, and has limited size which stays within a given `output_limit` parameter. * The temporary device memory usage can be limited by a soft limit `data_read_limit` parameter, allowing to read very large ORC files without OOM. * ORC files containing many billions of rows can be properly read chunk-by-chunk without seeing the size overflow issue when the number of rows exceeds cudf size limit (`2^31` rows). Depends on: * #14911 * #15008 * #15169 * #15252 Partially contribute to #12228. --- ## Benchmarks Due to some small optimizations in ORC reader, reading ORC files all-at-once (reading the entire file into just one output table) can be a little bit faster. For example, with the benchmark `orc_read_io_compression`: ``` ## [0] Quadro RTX 6000 | io | compression | cardinality | run_length | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |---------------|---------------|---------------|--------------|------------|-------------|------------|-------------|---------------|---------|----------| | FILEPATH | SNAPPY | 0 | 1 | 183.027 ms | 7.45% | 157.293 ms | 4.72% | -25733.837 us | -14.06% | FAIL | | FILEPATH | SNAPPY | 1000 | 1 | 198.228 ms | 6.43% | 164.395 ms | 4.14% | -33833.020 us | -17.07% | FAIL | | FILEPATH | SNAPPY | 0 | 32 | 96.676 ms | 6.19% | 82.522 ms | 1.36% | -14153.945 us | -14.64% | FAIL | | FILEPATH | SNAPPY | 1000 | 32 | 94.508 ms | 4.80% | 81.078 ms | 0.48% | -13429.672 us | -14.21% | FAIL | | FILEPATH | NONE | 0 | 1 | 161.868 ms | 5.40% | 139.849 ms | 2.44% | -22018.910 us | -13.60% | FAIL | | FILEPATH | NONE | 1000 | 1 | 164.902 ms | 5.80% | 142.041 ms | 3.43% | -22861.258 us | -13.86% | FAIL | | FILEPATH | NONE | 0 | 32 | 88.298 ms | 5.15% | 74.924 ms | 1.97% | -13374.607 us | -15.15% | FAIL | | FILEPATH | NONE | 1000 | 32 | 87.147 ms | 5.61% | 72.502 ms | 0.50% | -14645.122 us | -16.81% | FAIL | | HOST_BUFFER | SNAPPY | 0 | 1 | 124.990 ms | 0.39% | 111.670 ms | 2.13% | -13320.483 us | -10.66% | FAIL | | HOST_BUFFER | SNAPPY | 1000 | 1 | 149.858 ms | 4.10% | 126.266 ms | 0.48% | -23591.543 us | -15.74% | FAIL | | HOST_BUFFER | SNAPPY | 0 | 32 | 92.499 ms | 4.46% | 77.653 ms | 1.58% | -14846.471 us | -16.05% | FAIL | | HOST_BUFFER | SNAPPY | 1000 | 32 | 93.373 ms | 4.14% | 80.033 ms | 3.19% | -13340.002 us | -14.29% | FAIL | | HOST_BUFFER | NONE | 0 | 1 | 111.792 ms | 0.50% | 97.083 ms | 0.50% | -14709.530 us | -13.16% | FAIL | | HOST_BUFFER | NONE | 1000 | 1 | 117.646 ms | 5.60% | 97.634 ms | 0.44% | -20012.301 us | -17.01% | FAIL | | HOST_BUFFER | NONE | 0 | 32 | 84.983 ms | 4.96% | 66.975 ms | 0.50% | -18007.403 us | -21.19% | FAIL | | HOST_BUFFER | NONE | 1000 | 32 | 82.648 ms | 4.42% | 65.510 ms | 0.91% | -17137.910 us | -20.74% | FAIL | | DEVICE_BUFFER | SNAPPY | 0 | 1 | 65.538 ms | 4.02% | 59.399 ms | 2.54% | -6138.560 us | -9.37% | FAIL | | DEVICE_BUFFER | SNAPPY | 1000 | 1 | 101.427 ms | 4.10% | 92.276 ms | 3.30% | -9150.278 us | -9.02% | FAIL | | DEVICE_BUFFER | SNAPPY | 0 | 32 | 80.133 ms | 4.64% | 73.959 ms | 3.50% | -6173.818 us | -7.70% | FAIL | | DEVICE_BUFFER | SNAPPY | 1000 | 32 | 86.232 ms | 4.71% | 77.446 ms | 3.32% | -8786.606 us | -10.19% | FAIL | | DEVICE_BUFFER | NONE | 0 | 1 | 52.189 ms | 6.62% | 45.018 ms | 4.11% | -7171.043 us | -13.74% | FAIL | | DEVICE_BUFFER | NONE | 1000 | 1 | 54.664 ms | 6.76% | 46.855 ms | 3.35% | -7809.803 us | -14.29% | FAIL | | DEVICE_BUFFER | NONE | 0 | 32 | 67.975 ms | 5.12% | 60.553 ms | 4.22% | -7422.279 us | -10.92% | FAIL | | DEVICE_BUFFER | NONE | 1000 | 32 | 68.485 ms | 4.86% | 62.253 ms | 6.23% | -6232.340 us | -9.10% | FAIL | ``` When memory is limited, chunked read can help avoiding OOM but with some sort of performance trade-off. For example, for reading a table of size 500MB from file using 64MB output limits and 640 MB data read limit: ``` | io | compression | cardinality | run_length | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |---------------|---------------|---------------|--------------|------------|-------------|------------|-------------|------------|---------|----------| | FILEPATH | SNAPPY | 0 | 1 | 183.027 ms | 7.45% | 350.824 ms | 2.74% | 167.796 ms | 91.68% | FAIL | | FILEPATH | SNAPPY | 1000 | 1 | 198.228 ms | 6.43% | 322.414 ms | 3.46% | 124.186 ms | 62.65% | FAIL | | FILEPATH | SNAPPY | 0 | 32 | 96.676 ms | 6.19% | 133.363 ms | 4.78% | 36.686 ms | 37.95% | FAIL | | FILEPATH | SNAPPY | 1000 | 32 | 94.508 ms | 4.80% | 128.897 ms | 0.37% | 34.389 ms | 36.39% | FAIL | | FILEPATH | NONE | 0 | 1 | 161.868 ms | 5.40% | 316.637 ms | 4.21% | 154.769 ms | 95.61% | FAIL | | FILEPATH | NONE | 1000 | 1 | 164.902 ms | 5.80% | 326.043 ms | 3.06% | 161.141 ms | 97.72% | FAIL | | FILEPATH | NONE | 0 | 32 | 88.298 ms | 5.15% | 124.819 ms | 5.17% | 36.520 ms | 41.36% | FAIL | | FILEPATH | NONE | 1000 | 32 | 87.147 ms | 5.61% | 123.047 ms | 5.82% | 35.900 ms | 41.19% | FAIL | | HOST_BUFFER | SNAPPY | 0 | 1 | 124.990 ms | 0.39% | 285.718 ms | 0.78% | 160.728 ms | 128.59% | FAIL | | HOST_BUFFER | SNAPPY | 1000 | 1 | 149.858 ms | 4.10% | 263.491 ms | 2.89% | 113.633 ms | 75.83% | FAIL | | HOST_BUFFER | SNAPPY | 0 | 32 | 92.499 ms | 4.46% | 127.881 ms | 0.86% | 35.382 ms | 38.25% | FAIL | | HOST_BUFFER | SNAPPY | 1000 | 32 | 93.373 ms | 4.14% | 128.022 ms | 0.98% | 34.650 ms | 37.11% | FAIL | | HOST_BUFFER | NONE | 0 | 1 | 111.792 ms | 0.50% | 241.064 ms | 1.89% | 129.271 ms | 115.64% | FAIL | | HOST_BUFFER | NONE | 1000 | 1 | 117.646 ms | 5.60% | 248.134 ms | 3.08% | 130.488 ms | 110.92% | FAIL | | HOST_BUFFER | NONE | 0 | 32 | 84.983 ms | 4.96% | 118.049 ms | 5.99% | 33.066 ms | 38.91% | FAIL | | HOST_BUFFER | NONE | 1000 | 32 | 82.648 ms | 4.42% | 114.577 ms | 2.34% | 31.929 ms | 38.63% | FAIL | | DEVICE_BUFFER | SNAPPY | 0 | 1 | 65.538 ms | 4.02% | 232.466 ms | 3.28% | 166.928 ms | 254.71% | FAIL | | DEVICE_BUFFER | SNAPPY | 1000 | 1 | 101.427 ms | 4.10% | 221.578 ms | 1.43% | 120.152 ms | 118.46% | FAIL | | DEVICE_BUFFER | SNAPPY | 0 | 32 | 80.133 ms | 4.64% | 120.604 ms | 0.35% | 40.471 ms | 50.50% | FAIL | | DEVICE_BUFFER | SNAPPY | 1000 | 32 | 86.232 ms | 4.71% | 125.521 ms | 3.93% | 39.289 ms | 45.56% | FAIL | | DEVICE_BUFFER | NONE | 0 | 1 | 52.189 ms | 6.62% | 182.943 ms | 0.29% | 130.754 ms | 250.54% | FAIL | | DEVICE_BUFFER | NONE | 1000 | 1 | 54.664 ms | 6.76% | 190.501 ms | 0.49% | 135.836 ms | 248.49% | FAIL | | DEVICE_BUFFER | NONE | 0 | 32 | 67.975 ms | 5.12% | 107.172 ms | 3.56% | 39.197 ms | 57.66% | FAIL | | DEVICE_BUFFER | NONE | 1000 | 32 | 68.485 ms | 4.86% | 108.097 ms | 2.92% | 39.611 ms | 57.84% | FAIL | ``` And if memory is too limited, chunked read with 8MB output limit/80MB data read limit: ``` | io | compression | cardinality | run_length | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | | io | compression | cardinality | run_length | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |---------------|---------------|---------------|--------------|------------|-------------|------------|-------------|------------|---------|----------| | FILEPATH | SNAPPY | 0 | 1 | 183.027 ms | 7.45% | 732.926 ms | 1.98% | 549.899 ms | 300.45% | FAIL | | FILEPATH | SNAPPY | 1000 | 1 | 198.228 ms | 6.43% | 834.309 ms | 4.21% | 636.081 ms | 320.88% | FAIL | | FILEPATH | SNAPPY | 0 | 32 | 96.676 ms | 6.19% | 363.033 ms | 1.66% | 266.356 ms | 275.51% | FAIL | | FILEPATH | SNAPPY | 1000 | 32 | 94.508 ms | 4.80% | 313.813 ms | 1.28% | 219.305 ms | 232.05% | FAIL | | FILEPATH | NONE | 0 | 1 | 161.868 ms | 5.40% | 607.700 ms | 2.90% | 445.832 ms | 275.43% | FAIL | | FILEPATH | NONE | 1000 | 1 | 164.902 ms | 5.80% | 616.101 ms | 3.46% | 451.199 ms | 273.62% | FAIL | | FILEPATH | NONE | 0 | 32 | 88.298 ms | 5.15% | 267.703 ms | 0.46% | 179.405 ms | 203.18% | FAIL | | FILEPATH | NONE | 1000 | 32 | 87.147 ms | 5.61% | 250.528 ms | 0.43% | 163.381 ms | 187.48% | FAIL | | HOST_BUFFER | SNAPPY | 0 | 1 | 124.990 ms | 0.39% | 636.270 ms | 0.44% | 511.280 ms | 409.06% | FAIL | | HOST_BUFFER | SNAPPY | 1000 | 1 | 149.858 ms | 4.10% | 747.264 ms | 0.50% | 597.406 ms | 398.65% | FAIL | | HOST_BUFFER | SNAPPY | 0 | 32 | 92.499 ms | 4.46% | 359.660 ms | 0.19% | 267.161 ms | 288.82% | FAIL | | HOST_BUFFER | SNAPPY | 1000 | 32 | 93.373 ms | 4.14% | 311.608 ms | 0.43% | 218.235 ms | 233.73% | FAIL | | HOST_BUFFER | NONE | 0 | 1 | 111.792 ms | 0.50% | 493.797 ms | 0.13% | 382.005 ms | 341.71% | FAIL | | HOST_BUFFER | NONE | 1000 | 1 | 117.646 ms | 5.60% | 516.706 ms | 0.12% | 399.060 ms | 339.20% | FAIL | | HOST_BUFFER | NONE | 0 | 32 | 84.983 ms | 4.96% | 258.477 ms | 0.46% | 173.495 ms | 204.15% | FAIL | | HOST_BUFFER | NONE | 1000 | 32 | 82.648 ms | 4.42% | 248.028 ms | 5.30% | 165.380 ms | 200.10% | FAIL | | DEVICE_BUFFER | SNAPPY | 0 | 1 | 65.538 ms | 4.02% | 606.010 ms | 3.76% | 540.472 ms | 824.68% | FAIL | | DEVICE_BUFFER | SNAPPY | 1000 | 1 | 101.427 ms | 4.10% | 742.774 ms | 4.64% | 641.347 ms | 632.33% | FAIL | | DEVICE_BUFFER | SNAPPY | 0 | 32 | 80.133 ms | 4.64% | 364.701 ms | 2.70% | 284.568 ms | 355.12% | FAIL | | DEVICE_BUFFER | SNAPPY | 1000 | 32 | 86.232 ms | 4.71% | 320.387 ms | 2.80% | 234.155 ms | 271.54% | FAIL | | DEVICE_BUFFER | NONE | 0 | 1 | 52.189 ms | 6.62% | 458.100 ms | 2.15% | 405.912 ms | 777.78% | FAIL | | DEVICE_BUFFER | NONE | 1000 | 1 | 54.664 ms | 6.76% | 478.527 ms | 1.41% | 423.862 ms | 775.39% | FAIL | | DEVICE_BUFFER | NONE | 0 | 32 | 67.975 ms | 5.12% | 260.009 ms | 3.71% | 192.034 ms | 282.51% | FAIL | | DEVICE_BUFFER | NONE | 1000 | 32 | 68.485 ms | 4.86% | 243.705 ms | 2.09% | 175.220 ms | 255.85% | FAIL | ``` Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Bradley Dice (https://github.com/bdice) - https://github.com/nvdbaranec - Vukasin Milovanovic (https://github.com/vuule) URL: #15094
- Loading branch information