Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support comparing datasets column-agnostic, using names only #55

Open
chadlwilson opened this issue Nov 9, 2021 · 0 comments
Open

Support comparing datasets column-agnostic, using names only #55

chadlwilson opened this issue Nov 9, 2021 · 0 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@chadlwilson
Copy link
Collaborator

chadlwilson commented Nov 9, 2021

Context / Goal

Currently, as documented in the README.md the names of columns are irrelevant when producing hashes of data. This is convenient, as you dont have to alias columns everywhere to ensure they are unique.

However it does make ordering important. While it might be more convenient to compare queries when the columns are all ordered identically, it's potentially less flexible. Arguably as long as there are no name duplicates, Recce could have made where you can tell a dataset to compare by column name matching.

Expected Outcome

  • Introduce an optional configuration element to the dataset:, called perhaps columnMatchMode
  • columnMatchMode should be positional by default (current behaviour)
  • columnMatchMode should also allow a new setting nameBased
  • When set to nameBased during hashing, Recce would
    • add elements to the hash in a deterministic order, based on the column names
    • fail with an appropriate error message if the column metadata implies there are two columns with the same name
    • (ideally) fast-fail if there are mismatched names of columns between the two queries (if one query has a column name that the other does not, the hash results cannot possibly match)

Out of Scope

Additional context / implementation notes

@chadlwilson chadlwilson added the enhancement New feature or request label Nov 9, 2021
@chadlwilson chadlwilson added the question Further information is requested label Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant