-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
331 apply imputation link to target #19
Conversation
@pytest.fixture(scope="class") | ||
def fir_bir_c_fic_test_data(): | ||
return load_and_format( | ||
Path("tests") / "data" / "apply_imputation_link" / "FIR_BIR_C_FIC.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path("tests") / "data" / "apply_imputation_link" / "FIR_BIR_C_FIC.csv"
Not a big fan of this aproach, we combine windows path object with some strings
We can define the full path like this:
Path("tests","data","apply_imputation_link","FIR_BIR_C_FIC.csv")
@@ -0,0 +1,161 @@ | |||
def create_and_merge_imputation_values( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this function imputes the values by creating new columns and merging them, so create and merge should go in the docstrings.
I understand this might crossover with the name of higher level function perhaps rename it to add_imputed_values?
df, | ||
imputation_class, | ||
reference, | ||
period, | ||
marker, | ||
combined_imputation, | ||
target, | ||
cumulative_forward_link, | ||
cumulative_backward_link, | ||
auxiliary, | ||
construction_link, | ||
imputation_types=("c", "fir", "bir", "fic"), | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type hints needed
return df.drop(columns=intermediate_columns) | ||
|
||
|
||
def create_impute(df, group, imputation_spec): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 verbs in function implies doing 2 things, perhaps rename to create_imputation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robertswh all looking good, i have left some minor optional comments. The most important thing is that the function is nicely broken down into smaller functions and returns the right results from the complicated test scenarios ( for FI and BI imputation , construction ones not ready yet). I am happy to be merged, it's up to you about applying the comments or not.
The function has many arguments and most of them are intermediate columns generated by the low level functions, but this applies to all the low level functions so we need to consider how to pass them in a reproducible way, perhaps via a config file ? I will create a separate ticket for this.
Summary
With the correct imputation link, the correct "value to impute off" and correct impute marker in place we can calculate the relevant imputed value. There are two low level functions here, one to create an imputed series, e.g. forward impute from a response, and a second to apply the relevant rows to a
combined_imputation
column. The higher order function runs both these function for construction, forward from result, backward from result and forward from construction.To be able to use a generalised function for all imputation types there is an imputation config to handle the differences in implementation.
There is only one test to cover four cases in this pull request. More tests are recommended in future, however, refactoring of the function was required to get it to work for a single test case, so I just applied it to all cases.
Checklists
This pull request meets the following requirements:
If you feel some of these conditions do not apply for this pull request, please
add a comment to explain why.