Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(df-repr): add back join order enumeration #204

Merged
merged 5 commits into from
Oct 30, 2024
Merged

Conversation

skyzh
Copy link
Member

@skyzh skyzh commented Oct 30, 2024

ref #194

after the memo table refactor, adding back a more efficient join order enumeration implementation.

@skyzh skyzh requested a review from jurplel October 30, 2024 02:22
@skyzh skyzh changed the title feat(core): add back join order enumeration feat(df-repr): add back join order enumeration Oct 30, 2024
Signed-off-by: Alex Chi <[email protected]>
@skyzh
Copy link
Member Author

skyzh commented Oct 30, 2024

(physical join not supported for now, and I don't think it's necessary to support it)

@@ -369,6 +369,10 @@ impl<T: RelNodeTyp> CascadesOptimizer<T> {
.map(|x| x.cost.0[0])
.unwrap_or(0.0)
}

pub fn memo(&self) -> &Memo<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing the memo table publicly strikes me as a bit scary. I am worried users of the library might try to manipulate the memo table manually.

I'm hoping access is read-only (looks like it is?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it's read only 🤪

// logical_join_orders.iter().map(|x| x.to_string()).join("\n"),
// ));
// }
let join_orders = optimizer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this optional or always calculated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always calculated, as before, unless we can find a better way of passing options through datafusion SQL...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why I didn't close the issue, this is mentioned in the issue

@@ -0,0 +1,186 @@
//! Memo table extensions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely much cleaner than in the bridge.

  1. I think we'll need to be careful to document it so that it isn't misused by users
  2. I think the persistent memo table stuff might actually make this extension system unnecessary, since we can read the database to see which join orders were enumerated.

For now though, definitely an improvement!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the join information is not directly persisted in the database? we still need to go through this depth-first search algorithm to reconstruct such info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but then we can do it after the query runs instead of checking the memo table "in vivo"

@skyzh
Copy link
Member Author

skyzh commented Oct 30, 2024

enable auto-merge...

@skyzh skyzh enabled auto-merge (squash) October 30, 2024 03:05
@skyzh skyzh merged commit 4096073 into main Oct 30, 2024
1 check passed
@skyzh skyzh deleted the skyzh/join-orders branch October 30, 2024 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants