You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What's your use case?
Cartesian product is a common issue when joining dataset with a bad key. It's not always easy to detect and users can even forget to check for it because they think they know their data.
What's your proposed solution?
What I suggest is an option to check if there will be a cartesian product on the join tool.
-there is a label "Cartesian product (non join key uniqueness) detection"
-under it a drop down menu with three choices
-do nothing
-fail
-warning
Algo :
if do nothing==> well... do nothing more than actual behaviour.
if "fail" or "warning" : count distinct of join key (potentially a combination of fields) versus count row on each side of the join. If no count distinct is equal to simple count, display a warning or an error message.
<style>
</style>
order_id
customer_id
amount
1
1
100
2
1
150
3
2
142
4
2
145
5
1
145
<style>
</style>
customer_id
customer_name
1
toto
2
tata
2
titi
on customer_id
=> count distinct left : 2; count left : 5
=> count distinct right : 2; count right : 3
2<>5 and 2<>3
=> there will be a cartesian product
=> for the count, it must be a count(*) or a sum(1) but not a count of the field itself because of the null.
Are there any alternative solutions?
N/A
Best regards,
Simon
The text was updated successfully, but these errors were encountered:
What's your use case?
Cartesian product is a common issue when joining dataset with a bad key. It's not always easy to detect and users can even forget to check for it because they think they know their data.
What's your proposed solution?
What I suggest is an option to check if there will be a cartesian product on the join tool.
-there is a label "Cartesian product (non join key uniqueness) detection"
-under it a drop down menu with three choices
-do nothing
-fail
-warning
Algo :
<style> </style>if do nothing==> well... do nothing more than actual behaviour.
if "fail" or "warning" : count distinct of join key (potentially a combination of fields) versus count row on each side of the join. If no count distinct is equal to simple count, display a warning or an error message.
on customer_id
=> count distinct left : 2; count left : 5
=> count distinct right : 2; count right : 3
2<>5 and 2<>3
=> there will be a cartesian product
=> for the count, it must be a count(*) or a sum(1) but not a count of the field itself because of the null.
Are there any alternative solutions?
N/A
Best regards,
Simon
The text was updated successfully, but these errors were encountered: