forked from datafusion-contrib/datafusion-dft
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add
--analyze
param to CLI (datafusion-contrib#216)
TLDR: Execution plan output from EXPLAIN ANALYZE tells a story of how a query was executed and the goal of this command is to output a summary of that story. I am working on adding this feature because I often look at the output of EXPLAIN ANALYZE and need to connect several dots myself to see what is important which can be quite manual and then if you are trying to do this across different queries or with different source data layouts you end up redoing a lot of work. The new analyze parameter generates output like the below which is designed to make it easier to analyze queries and find hot spots (IO vs Compute and a breakdown of the core parts for each) and / or optimization opportunties. It is still very much a WIP (organization, stats to shows, wording, etc) but it is now starting to take shape into something close to what i envision the v1 version will look like. IO stats will have some relevant metrics available for different io execs (parquet, csv, ndjson, arrow, etc). Starting with just parquet for now. For compute stats i think im going to have 4 sections (projection, filter, sort, join, and aggregate, and other which will include everything else - getting more insight from others on potential execs to extract would be interesting). One of the open points is about how i aggregate stats on the same nodes / execs. Im probably going to just rollup details per exec for v1 but you do lose some interesting information by doing that so ideally id like to improve on that. An example of using this could be benchmarking and analyzing parquet files with different layouts (for example file sizes, row group sizes, bloom filters, page sizes, etc). Ideally running this command and comparing the results should make it easier to choose the optimal layout. Todo for v1 - [x] Selectivity based on rows - [x] Selectivity based on bytes - [x] Add "Other" compute stats - [x] Add "Join" compute stats - [x] Add "Aggregate" compute stats - [x] Cleanup display (alignment, same header widths, fixed decimals, etc) - [x] ~For IO add % of total for scanning and opening~ Going to revisit this - [x] ~Add throughput to summary based on output bytes from IO layer. Would be cool to do a pass that pulled actual file sizes that could be used.~ Going to revisit this - [x] Ratio of selectivity to pruning for parquet - [x] Avg/median time to scan single row group (total scanning time / row groups) - [x] Update README Future - identify nodes where some partitions are "hot" (i.e doing much more compute than others) <img width="805" alt="image" src="https://github.com/user-attachments/assets/3a853ed6-8db6-4d23-91dd-96aa959d7936">
- Loading branch information
1 parent
38d5c25
commit 0060fb2
Showing
9 changed files
with
875 additions
and
55 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -95,3 +95,6 @@ Cargo.lock | |
|
||
# For local development of benchmarking | ||
benchmark_results/ | ||
|
||
# For local query testing | ||
queries/** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.