Skip to content

Commit

Permalink
Spark 3.2: Update task stats for split files (#4446)
Browse files Browse the repository at this point in the history
Co-authored-by: Prashant Singh <[email protected]>
  • Loading branch information
singhpk234 and Prashant Singh authored Apr 3, 2022
1 parent 6f4ba69 commit 4ab4b91
Showing 1 changed file with 3 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,9 @@ protected Statistics estimateStatistics(Snapshot snapshot) {

for (CombinedScanTask task : tasks()) {
for (FileScanTask file : task.files()) {
numRows += file.file().recordCount();
// TODO: if possible, take deletes also into consideration.
double fractionOfFileScanned = ((double) file.length()) / file.file().fileSizeInBytes();
numRows += (fractionOfFileScanned * file.file().recordCount());
}
}

Expand Down

0 comments on commit 4ab4b91

Please sign in to comment.