Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cumulative imputation links are incorrect for FIC #22

Closed
robertswh opened this issue Jun 6, 2024 · 3 comments · Fixed by #29
Closed

Cumulative imputation links are incorrect for FIC #22

robertswh opened this issue Jun 6, 2024 · 3 comments · Fixed by #29
Assignees
Labels
bug Something isn't working

Comments

@robertswh
Copy link
Collaborator

robertswh commented Jun 6, 2024

The imputation_group column is created by looking for changes in three columns:

  • missing_value which is True or False depending on a return or non-return
  • group
  • reference

This works for most cases, but not for FIC.

The imputation_group column should look at imputation_marker for changes instead of missing_value. There is some pre-processing needed because we look for changes in numerical columns, not categorical. One solution could be to create a new column (e.g. mapped_marker) for mapping of markers to numbers, e.g. mapping = {"r": 1, "c": 2, "fir": 3, "bir": 4, "fic": 5} (exact numbers doesn't matter as long as they're unique), then the condition would be (dataframe["mapped_marker"].diff(time_difference) != 0).

This condition would replace the following condition:

(dataframe["missing_value"].diff(time_difference) != 0)

@robertswh robertswh added the bug Something isn't working label Jun 6, 2024
@AntonZogk
Copy link
Collaborator

AntonZogk commented Jun 10, 2024

By using this condition
(dataframe['imputation_marker'].ne(dataframe['imputation_marker'].shift().bfill()).astype(int) != 0)

instead of

(dataframe["missing_value"].diff(time_difference) != 0)

fixed the issue for cummulative links , it returns the correct cumulative link. But still the imputed value afterwards is wrong.

@AntonZogk
Copy link
Collaborator

AntonZogk commented Jun 10, 2024

When calculating the FIC the create_and_merge_imputation_values function should point at Imputed_value instead of constructed since it has all the constructed values and we can't do the required forward fill

@AntonZogk
Copy link
Collaborator

imputation_config = {
"c": {
"intermediate_column": "constructed",
"marker": "C",
# doesn't actually apply a fill so can be forward or back
"fill_column": auxiliary,
"fill_method": "ffill",
"link_column": construction_link,
},
"fir": {
"intermediate_column": "fir",
"marker": "FIR",
"fill_column": target,
"fill_method": "ffill",
"link_column": cumulative_forward_link,
},
"bir": {
"intermediate_column": "bir",
"marker": "BIR",
"fill_column": target,
"fill_method": "bfill",
"link_column": cumulative_backward_link,
},
"fic": {
# FIC only works if the C is in the first period of the business being
# sampled. This is fine for automatic imputation, but should be careful
# if manual construction imputation is done
"intermediate_column": "fic",
"marker": "FIC",
# this has to have the same name as the intermediate column for constructed
"fill_column": "constructed",
"fill_method": "ffill",
"link_column": cumulative_forward_link,
},

The marker values of the above must match (case sensitive) these:

"r": df["r_flag"],
"fir": ~df["r_flag"] & df["fir_flag"],
"bir": ~df["r_flag"] & ~df["fir_flag"] & df["bir_flag"],
"fic": ~df["r_flag"] & ~df["fir_flag"] & ~df["bir_flag"] & df["fic_flag"],
"c": ~df["r_flag"]
& ~df["fir_flag"]

@AntonZogk AntonZogk changed the title cumulative imputation links are incorrect for FIC Cumulative imputation links are incorrect for FIC Jun 11, 2024
@AntonZogk AntonZogk linked a pull request Jun 11, 2024 that will close this issue
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants