Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add RightMark Join #13252

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jonathanc-n
Copy link
Contributor

Which issue does this PR close?

Closes #13138 .

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate substrait common Related to common crate proto Related to proto crate labels Nov 4, 2024
@jonathanc-n jonathanc-n marked this pull request as draft November 4, 2024 17:54
@jonathanc-n
Copy link
Contributor Author

@eejbyfeldt Just leaving it as a draft for now, if you have any pointers feel free to add on.

@jonathanc-n jonathanc-n changed the title feat: RightMark Join feat: add RightMark Join Nov 4, 2024
@@ -136,6 +136,9 @@ fn swap_join_type(join_type: JoinType) -> JoinType {
JoinType::LeftMark => {
unreachable!("LeftMark join type does not support swapping")
}
JoinType::RightMark => {
unreachable!("RightMark join type does not support swapping")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can support it now, I think this should be relatively easy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, supporting swap is partly why we are adding RightMark. So would be good to have that fixed in this PR.

Copy link
Contributor

@eejbyfeldt eejbyfeldt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments, otherwise it looks good so far to me.

Comment on lines 735 to 748
let probe_indices = (0..prune_length)
.map(R::Native::from_usize)
.collect::<PrimitiveArray<R>>();
let build_indices = (0..prune_length)
.map(|idx| {
// For mark join we output a dummy index 0 to indicate the row had a match
if visited_rows.contains(&(idx + deleted_offset)) {
Some(L::Native::from_usize(0).unwrap())
} else {
None
}
})
.collect();
(build_indices, probe_indices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very similar to what we do or the LeftMark could we move this code into a function and call it for both?

datafusion/sql/src/unparser/plan.rs Outdated Show resolved Hide resolved
datafusion/substrait/src/logical_plan/producer.rs Outdated Show resolved Hide resolved
Comment on lines +694 to +700
let left_field = once((
Field::new("mark", arrow_schema::DataType::Boolean, false),
ColumnIndex {
index: 0, // 'mark' is not associated with either side
side: JoinSide::None,
},
));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could also be moved to a function that is used for both LeftMark and RightMark.

@@ -3911,6 +3913,7 @@ impl<'de> serde::Deserialize<'de> for JoinType {
"RIGHTSEMI" => Ok(JoinType::Rightsemi),
"RIGHTANTI" => Ok(JoinType::Rightanti),
"LEFTMARK" => Ok(JoinType::Leftmark),
"RIGHTMARK" => Ok(JoinTYpe::Rightmark),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing CI failures. I belive this file should be generated by running: https://github.com/apache/datafusion/blob/main/datafusion/proto-common/regen.sh

Suggested change
"RIGHTMARK" => Ok(JoinTYpe::Rightmark),
"RIGHTMARK" => Ok(JoinType::Rightmark),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this included in the docs?

@@ -136,6 +136,9 @@ fn swap_join_type(join_type: JoinType) -> JoinType {
JoinType::LeftMark => {
unreachable!("LeftMark join type does not support swapping")
}
JoinType::RightMark => {
unreachable!("RightMark join type does not support swapping")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, supporting swap is partly why we are adding RightMark. So would be good to have that fixed in this PR.

@jonathanc-n
Copy link
Contributor Author

jonathanc-n commented Nov 5, 2024

@eejbyfeldt I implemented the swapping, would be nice to see if I did that correctly.

I made a change to adjust_indices_by_join_type and combined the logic for the right join and rightmark join for indices:

 JoinType::Right | JoinType::RightMark => {
          // combine the matched and unmatched right result together
          append_right_indices(
              left_indices,
              right_indices,
              adjust_range,
              preserve_order_for_right,
          )
      }

join_right_mark test had previously not given any results and is now giving a table with values. However, it is marking all the values as true and I cannot seem to track the issue down. Would you be able to look into this?

Instead of this:

"+----+----+----+-------+",
"| a2 | b1 | c2 | mark  |",
"+----+----+----+-------+",
"| 10 | 4  | 70 | true  |",
"| 20 | 5  | 80 | true  |",
"| 30 | 6  | 90 | false |",
"+----+----+----+-------+",

it is giving this:
"+----+----+----+-------+",
"| a2 | b1 | c2 | mark  |",
"+----+----+----+-------+",
"| 10 | 4  | 70 | true  |",
"| 20 | 5  | 80 | true  |",
"| 30 | 6  | 90 | true |",
"+----+----+----+-------+",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions proto Related to proto crate sql SQL Planner substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement RightMark join
3 participants