-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postprocessing step np.nan_to_num #39
Comments
-1 implies copy=True (can test this with The same postprocess ( Because the values of ACS variables are largely categorical I think there should maybe be more discussion around what to do with NaN values rather than just replace with -1. Perhaps entries with NaN values for some variable should be thrown out? If the answer is to just let the user decide then discussion of this question and alternative postprocessing functions with their implications is definitely warranted in the README. |
Right, I just stumbled over this while reviewing #41 If I had to guess what happened, I'd say that the way we called I'll have to think more about what's the best solution for dealing with nans. Given that this package has been used for quite some time, my inclination might be to continue to replace NaNs with 0 to ensure replicability of various results that might depend on it. |
Personally I think it'd make sense to keep the previous unintended behavior (replace with One potential issue with replacing I looked into it and only See the last cell of this notebook for a more detailed investigation of which tasks and columns are affected. Since only one task seems to be affected by this I suggest keeping the previous behavior and adding a warning in the package repository. Also, the implicit NaN-to-0 mapping should perhaps be made explicit with |
FWIW the teachability of this is awesome. If we could find some results that changed due to the treatment of the nans that would be so great for a teaching lesson eg on data processing decisions affecting results. |
In my previous comment I provide an example of such a case! @ericvd-ucb
I just checked for the Texas example in the README too, and it also changes: from the reported 0.0397 to 0.03419 |
Hello thank you for providing us with this package. In the post processing step of acs.ACSIncome np.nan_to_num is applied with the second argument -1. Second argument of np.nan_to_num is [copy] (https://numpy.org/doc/stable/reference/generated/numpy.nan_to_num.html) which is a boolean variable. Does -1 implies copy=False or the goal was to replace missing values with -1?
The text was updated successfully, but these errors were encountered: