Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

402 incorrect flags #32

Merged
merged 6 commits into from
Jun 25, 2024
Merged

402 incorrect flags #32

merged 6 commits into from
Jun 25, 2024

Conversation

AntonZogk
Copy link
Collaborator

@AntonZogk AntonZogk commented Jun 13, 2024

Summary

The aim is fixing a bug when a target value was not supplied

Checklists

This pull request meets the following requirements:

  • installable with all dependencies recorded
  • runs without error
  • follows PEP8 and project specific conventions
  • appropriate use of comments, for example no descriptive comments
  • functions documented using Numpy style docstings
  • assumptions and decisions log considered and updated if appropriate
  • unit tests have been updated to cover essential functionality for a reasonable range of inputs and conditions
  • other forms of testing such as end-to-end and user-interface testing have been considered and updated as required
  • tests suite passes (locally as a minimum)
  • peer reviewed with review recorded

If you feel some of these conditions do not apply for this pull request, please
add a comment to explain why.

Closes #28

* Add fill group for filling
* Fill group check if there is a missing date too
* Add period argument to function, to check for missing dates
Add scenario with missing date
Adjust test function to use period column
In order to check if strata, reference difference is not 0 we need to
sort the values.
* Fill group works only with diff(1) since we are seperating the values
by strata and if missing date exists.
* astype(int) is redundant, when summing True/False same as 1/0
* Add kwargs and update docstrings
@AntonZogk AntonZogk requested a review from shilohd June 20, 2024 10:26
@AntonZogk AntonZogk mentioned this pull request Jun 24, 2024
10 tasks
@Jday7879 Jday7879 self-requested a review June 25, 2024 12:51
Copy link
Collaborator

@Jday7879 Jday7879 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work catching and fixing this issue Anton.
Left some minor reviews with possible suggestions.


# TODO : similar conditions at cum imputation links
df["fill_group"] = (
(df[period] - pd.DateOffset(months=1) != df.shift(1)[period])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard coding the month offset? variable frequency of data, if supplied every 4 months this would need to be updated.

(df[period] - pd.DateOffset(months=1) != df.shift(1)[period])
| (df[strata].diff(1) != 0)
| (df[reference].diff(1) != 0)
).cumsum()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToDo: cumsum() does not work when date offset is -1.
Might be nice to have but maybe not essential.
Cumsum works top to bottom of a column? Might need to see if that can be reversed.

Copy link
Collaborator Author

@AntonZogk AntonZogk Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this code is trying to do is assign a unique id for the values which meets these criteria (values were sorted):

  1. Date difference between 2 rows is the same if you offset them
  2. no difference in strata
  3. no difference in refence

So hard coded 1 in pd.DateOffset refers to the data frequency and this approach should fail if not monthly data are supplied. The hard coded 1 in the shift and diff refers to the previous row though ( this is why we sorted the data) consequently no reason changing these hard coded values. But if we do we need to change the cumsum to do a reverse order for -1, bellow is the result without cumsum() for values 1 and -1. So for -1 we need to apply cumsum() in reverse order (we can use [::-1].cumsum()), but both 1 and -1 will give the same result ( values will have same integer if same strata, reference and date diffenence is ok) . There is a TODO for now and we can revisit this in the future.

reference strata period target_variable fill_group_1 fill_group_-1
0 1 100 2020-01-01 00:00:00 8444 True False
1 1 100 2020-02-01 00:00:00 nan False False
2 1 100 2020-03-01 00:00:00 2003 False False
3 1 100 2020-04-01 00:00:00 1003 False True
4 2 100 2020-01-01 00:00:00 nan True False
5 2 100 2020-02-01 00:00:00 nan False False
6 2 100 2020-03-01 00:00:00 nan False False
7 2 100 2020-04-01 00:00:00 3251 False True
8 3 100 2020-01-01 00:00:00 nan True False
9 3 100 2020-02-01 00:00:00 7511 False False
10 3 100 2020-03-01 00:00:00 1234 False False
11 3 100 2020-04-01 00:00:00 1214 False True
12 4 100 2020-01-01 00:00:00 64 True False
13 4 100 2020-02-01 00:00:00 nan False False
14 4 100 2020-03-01 00:00:00 nan False False
15 4 100 2020-04-01 00:00:00 254 False True
16 5 100 2020-01-01 00:00:00 65 True False
17 5 100 2020-02-01 00:00:00 342 False False
18 5 100 2020-03-01 00:00:00 634 False False
19 5 100 2020-04-01 00:00:00 254 False True
20 6 100 2020-01-01 00:00:00 64 True False
21 6 100 2020-02-01 00:00:00 nan False False
22 6 100 2020-03-01 00:00:00 654 False False
23 6 100 2020-04-01 00:00:00 nan False True
24 7 100 2020-01-01 00:00:00 nan True False
25 7 100 2020-02-01 00:00:00 nan False False
26 7 100 2020-03-01 00:00:00 nan False True

@AntonZogk AntonZogk removed the request for review from shilohd June 25, 2024 13:40
@AntonZogk AntonZogk merged commit 2a6c867 into main Jun 25, 2024
3 checks passed
@AntonZogk AntonZogk deleted the 402-incorrect-flags branch June 25, 2024 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect flags for imputation
2 participants