[Enhancement] reduce the number of fd hold by hash join spilling (backport #52020) #52083

mergify · 2024-10-18T06:24:24Z

Why I'm doing:

When hash join triggers spilling, the data will be partitioned and then flushed. The same partition may be flushed multiple times, generating multiple blocks. When the partition is large enough, it will trigger a split, generate two new partitions and delete the existing partitions.

We hope that blocks from different partitions will not be placed in the same container, so that when a partition is no longer used, the data can be deleted in time. Currently, this is achieved by introducing the concept of exclusive for blocks. If a block is marked as exclusive, the container will not be returned to the block manager when it is released, thus achieving the purpose of exclusivity.
However, there is a problem with this. For LogBlockManager, each block will occupy a container, resulting in a lot of small files.

What I'm doing:

In fact, we don’t need to allocate a container for each block. We just need to ensure that the containers occupied by blocks in different partitions do not overlap.

In order to solve the problem of too many small files, the concept of affinity_group is introduced for blocks. Blocks belonging to the same affinity_group will share the same batch of containers, and blocks belonging to different affinity groups will not have overlapping containers.

I tested the following query on the ssb_100g data. Before optimization, the number of fds hold by spilling was 2k+. After optimization, the number of fds did not exceed 200.

set pipeline_dop=1;
set enable_spill=true;
set spill_mode='force';
select count(l.lo_linenumber) from lineorder l right outer join lineorder r on l.lo_custkey = r.lo_custkey;

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
This is a backport pr

Bugfix cherry-pick branch check:

This is an automatic backport of pull request #52020 done by [Mergify](https://mergify.com). ## Why I'm doing:

When hash join triggers spilling, the data will be partitioned and then flushed. The same partition may be flushed multiple times, generating multiple blocks. When the partition is large enough, it will trigger a split, generate two new partitions and delete the existing partitions.

We hope that blocks from different partitions will not be placed in the same container, so that when a partition is no longer used, the data can be deleted in time. Currently, this is achieved by introducing the concept of exclusive for blocks. If a block is marked as exclusive, the container will not be returned to the block manager when it is released, thus achieving the purpose of exclusivity.
However, there is a problem with this. For LogBlockManager, each block will occupy a container, resulting in a lot of small files.

What I'm doing:

In fact, we don’t need to allocate a container for each block. We just need to ensure that the containers occupied by blocks in different partitions do not overlap.

In order to solve the problem of too many small files, the concept of affinity_group is introduced for blocks. Blocks belonging to the same affinity_group will share the same batch of containers, and blocks belonging to different affinity groups will not have overlapping containers.

I tested the following query on the ssb_100g data. Before optimization, the number of fds hold by spilling was 2k+. After optimization, the number of fds did not exceed 200.

set pipeline_dop=1;
set enable_spill=true;
set spill_mode='force';
select count(l.lo_linenumber) from lineorder l right outer join lineorder r on l.lo_custkey = r.lo_custkey;

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
This is a backport pr

) Signed-off-by: silverbullet233 <[email protected]> (cherry picked from commit 9edb5d4)

[Enhancement] reduce the number of fd hold by hash join spilling (#52020

0b3134d

) Signed-off-by: silverbullet233 <[email protected]> (cherry picked from commit 9edb5d4)

mergify bot mentioned this pull request Oct 18, 2024

[Enhancement] reduce the number of fd hold by hash join spilling #52020

Merged

24 tasks

github-actions bot assigned silverbullet233 Oct 18, 2024

github-actions bot added the automerge label Oct 18, 2024

wanpengfei-git enabled auto-merge (squash) October 18, 2024 06:26

silverbullet233 approved these changes Oct 18, 2024

View reviewed changes

wanpengfei-git merged commit ed156ea into branch-3.3 Oct 18, 2024
32 of 33 checks passed

wanpengfei-git deleted the mergify/bp/branch-3.3/pr-52020 branch October 18, 2024 06:59

github-actions bot added the version:3.3.5 label Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] reduce the number of fd hold by hash join spilling (backport #52020) #52083

[Enhancement] reduce the number of fd hold by hash join spilling (backport #52020) #52083

mergify bot commented Oct 18, 2024 •

edited by wanpengfei-git

Loading

[Enhancement] reduce the number of fd hold by hash join spilling (backport #52020) #52083

[Enhancement] reduce the number of fd hold by hash join spilling (backport #52020) #52083

Conversation

mergify bot commented Oct 18, 2024 • edited by wanpengfei-git Loading

Why I'm doing:

What I'm doing:

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

What I'm doing:

What type of PR is this:

Checklist:

mergify bot commented Oct 18, 2024 •

edited by wanpengfei-git

Loading