-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
365 default imputation links #47
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of suggestions but this seems to cover the edge cases we are worried about as far as I understand them :)
The spec says that the matched pairs should be either missing or zero to indicate what type of default link case has happened - is this covered in the tests?
actual_output = actual_output.rename( | ||
columns={ | ||
"default_link_b_match": "default_backward", | ||
"default_link_f_match": "default_forward", | ||
"default_link_flag_construction_matches": "default_construction", | ||
"flag_construction_matches_pair_count": "flag_match_pair_count", | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do these need to be renamed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit test for ratio of means are taken from the Pyspark versions which use different column names. It might be worth to bulk change these at some point, but there are about 70 files which would need to be updated. Happy to add this onto the tech backlog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I agree that it's worth changing them but it's definitely not the top priority
Unit test data is taken from the Pyspark method and have been built to account for zeros and nulls present in the data (From my understanding) Updated the code in line with previous comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for responding to my comments, happy to approve :)
Summary
Implemented method for calculating default imputation link values based on specifications
Checklists
This pull request meets the following requirements:
If you feel some of these conditions do not apply for this pull request, please
add a comment to explain why.