Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cartesian product detection in join tool #6899

Open
simonaubertbd opened this issue Sep 21, 2024 · 0 comments
Open

Cartesian product detection in join tool #6899

simonaubertbd opened this issue Sep 21, 2024 · 0 comments
Assignees

Comments

@simonaubertbd
Copy link

What's your use case?
Cartesian product is a common issue when joining dataset with a bad key. It's not always easy to detect and users can even forget to check for it because they think they know their data.

What's your proposed solution?
What I suggest is an option to check if there will be a cartesian product on the join tool.

-there is a label "Cartesian product (non join key uniqueness) detection"
-under it a drop down menu with three choices
-do nothing
-fail
-warning

Algo :
if do nothing==> well... do nothing more than actual behaviour.
if "fail" or "warning" : count distinct of join key (potentially a combination of fields) versus count row on each side of the join. If no count distinct is equal to simple count, display a warning or an error message.

<style> </style>
order_id customer_id amount
1 1 100
2 1 150
3 2 142
4 2 145
5 1 145
<style> </style>
customer_id customer_name
1 toto
2 tata
2 titi

on customer_id

=> count distinct left : 2; count left : 5
=> count distinct right : 2; count right : 3
2<>5 and 2<>3
=> there will be a cartesian product

=> for the count, it must be a count(*) or a sum(1) but not a count of the field itself because of the null.

Are there any alternative solutions?
N/A

Best regards,

Simon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants