Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH test isotonic calibration on simulation data & FIX correct joblib dependency error #4

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

jzheng17
Copy link
Collaborator

@jzheng17 jzheng17 commented Jun 1, 2022

Simulation test on #2

I did the overlapping gaussian tests for the Iso-Honest Forest in a Jupyter Notebook and had to adjust the original honest forest classes a bit for importing them into a Jupyter Notebook.

Accommodate running HF in Jupyter Notebook Environment
Added calibrated HF to overlapping gaussian
Tests done for Isotonic Calibrated HF
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@jzheng17
Copy link
Collaborator Author

jzheng17 commented Jun 1, 2022

It seems like the failing is happening in the original test_tree and test_forest files with their import modules (possibly caused by some dependency issues?). I have not touched those.

try to pass pytest checks for committing (these files exist in the original repo already, I did not touch them)
try to pass pytest checks for committing (these files exist in the original repo already, I did not touch them)
try to pass pytest checks for committing (these files exist in the original repo already, I did not touch them)
fix dependency issues with the new version of joblib (no longer uses **_joblib_parallel_args)
@PSSF23 PSSF23 changed the title Uploading the Overlapping Gaussian Tests ENH test isotonic calibration on simulation data & FIX correct joblib dependency error Jun 1, 2022
@jzheng17
Copy link
Collaborator Author

jzheng17 commented Jun 1, 2022

The HonestForest class implemented by Ronan used joblib's backend for parallelization, which involved an import from sklearn's util files. However, sklearn's util file changed between the time Ronan wrote his HF class and the time I was making this commit, in which the util file abandoned its outdated usage of the previous version of joblib ( _joblib_parallel_args deprecated and is no longer in use). This caused dependency issues because Ronan's implementation still used the outdated version. I fixed the above issue.

@PSSF23 PSSF23 requested review from PSSF23 and rflperry June 2, 2022 04:08
Copy link
Member

@rflperry rflperry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. It seems like there is one extra file or something incorporated? Normally it would be better to make 2 PRs. One for the joblib change and one for the notebook. But this is fine.

Comment on lines +44 to +55
(
"Iso-HonestRF",
CalibratedClassifierCV(
base_estimator=HonestForestClassifier(
n_estimators=n_estimators // clf_cv,
max_features=max_features,
n_jobs=n_jobs,
),
method="isotonic",
cv=clf_cv,
),
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say don't change this file? Leave the honest + IRF for just the notebook. This way the main figure in the repo reflects the paper and isn't as confusing to first time viewers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will revert the changes.

Comment on lines +1 to +2
"""Module for forest-based estimators"""
"""I'm just using this version to facilitate future changes -Audrey"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this file? Is it the same as the estimators/forest.py file? It seems like this was maybe accidentally left in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is created because it seems like if I import HF directly from the forest.py in a Jupyter Notebook will cause dependency errors. I've experimented a bunch of ways and it seems the only fix that works (a pretty dumb way, I have to admit) is to combine your forest.py and tree.py into one file. I personally suspect it's because the compiler for Jupyter Notebook is doing weird things if the file you are importing requires another import from another file you wrote.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jzheng17 you should install the package & then use the honest_forests library.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be able to do that too. I went for a quick fix at that time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait. I remembered that installing dependency for Jupyter Notebooks and for .py scripts might be a little different. I guess I will still try after I recover from bronchitis.

@@ -446,7 +447,9 @@ def _predict_proba(self, X, indices=None, impute_missing=None):
Parallel(
n_jobs=n_jobs,
verbose=self.verbose,
**_joblib_parallel_args(require="sharedmem")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thanks for this fix. An alternative fix would be to require a lower version of sklearn but this is better. I assume I had a lower version. I assume this requires sklearn > 1.0? Can you add this to the requirments.txt file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it does require sklearn > 1.0. Note that this is because sklearn ppl changed their util.py to bump joblib version dependency to 1.0.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay. We should make sure the requirements.txt file states the proper requirements. We don't want errors because this code isn't compatible with old versions of joblib or sklearn permitted by our requirements.txt file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants