Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Threshold for pairs learners #168

Merged
merged 43 commits into from
Apr 15, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
676ab86
add some tests for testing that different scores work using the scori…
Feb 4, 2019
cc1c3e6
ENH: Add tests and basic threshold implementation
Feb 5, 2019
f95c456
Add support for LSML and more generally quadruplets
Feb 6, 2019
9ffe8f7
Make CalibratedClassifierCV work (for preprocessor case) thanks to cl…
Feb 6, 2019
3354fb1
Fix some tests and PEP8 errors
Feb 7, 2019
12cb5f1
change the sign in decision function
Feb 19, 2019
dd8113e
Add docstring for threshold_ and classes_ in the base _PairsClassifie…
Feb 19, 2019
1c8cd29
remove quadruplets from the test with scikit learn custom scorings
Feb 19, 2019
d12729a
Remove argument y in quadruplets learners and lsml
Feb 20, 2019
dc9e21d
FIX fix docstrings of decision functions
Feb 20, 2019
402729f
FIX the threshold by taking the opposite (to be adapted to the decisi…
Feb 20, 2019
aaac3de
Fix tests to have no y for quadruplets' estimator fit
Feb 21, 2019
e5b1e47
Remove isin to be compatible with old numpy versions
Feb 21, 2019
a0cb3ca
Fix threshold so that it has a positive value and add small test
Feb 21, 2019
8d5fc50
Fix threshold for itml
Feb 21, 2019
0f14b25
FEAT: Add calibrate_threshold and tests
Mar 4, 2019
a6458a2
MAINT: remove starred syntax for compatibility with older versions of…
Mar 5, 2019
fada5cc
Remove debugging prints and make tests for ITML pass, while waiting f…
Mar 5, 2019
32a4889
FIX: from __future__ import division to pass tests for python 2.7
Mar 5, 2019
5cf71b9
Add some documentation for calibration
Mar 11, 2019
c2bc693
DOC: fix style
Mar 11, 2019
e96ee00
Merge branch 'master' into feat/add_threshold
Mar 21, 2019
3ed3430
Address most comments from aurelien's reviews
Mar 21, 2019
69c6945
Remove classes_ attribute and test for CalibratedClassifierCV
Mar 21, 2019
bc39392
Rename make_args_inc_quadruplets into remove_y_quadruplets
Mar 21, 2019
facc546
TST: Fix remaining threshold into min_rate
Mar 21, 2019
f0ca65e
Remove default_threshold and put calibrate_threshold instead
Mar 21, 2019
a6ec283
Use calibrate_threshold for ITML, and remove description
Mar 21, 2019
49fbbd7
ENH: use calibrate_threshold by default and display its parameters fr…
Mar 21, 2019
960b174
Add a small test to test automatic calibration
Mar 21, 2019
c91acf7
Update documentation of the default threshold
Mar 21, 2019
a742186
Inverse sense for threshold comparison to be more intuitive
Mar 21, 2019
9ec1ead
Address remaining review comments
Mar 21, 2019
986fed3
MAINT: Rename threshold_params into calibration_params
Mar 26, 2019
3f5d6d1
TST: Add test for extreme cases
Mar 26, 2019
7b5e4dd
MAINT: rename threshold_params into calibration_params
Mar 26, 2019
a3ec02c
MAINT: rename threshold_params into calibration_params
Mar 26, 2019
ccc66eb
FIX: Make tests work, and add the right threshold (mean between lowes…
Mar 27, 2019
6dff15b
Merge branch 'master' into feat/add_threshold
Mar 27, 2019
719d018
Go back to previous version of finding the threshold
Apr 2, 2019
551d161
Extract method for validating calibration parameters
Apr 2, 2019
594c485
Validate calibration params before fit
Apr 2, 2019
14713c6
Address https://github.com/metric-learn/metric-learn/pull/168#discuss…
Apr 2, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 12 additions & 5 deletions metric_learn/base_metric.py
Original file line number Diff line number Diff line change
Expand Up @@ -374,8 +374,10 @@ def score(self, pairs, y):
def set_default_threshold(self, pairs, y):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to remove if indeed we choose the accuracy-calibrated threshold as default

"""Returns a threshold that is the mean between the similar metrics
mean, and the dissimilar metrics mean"""
similar_threshold = np.mean(self.decision_function(pairs[y==1]))
dissimilar_threshold = np.mean(self.decision_function(pairs[y==1]))
similar_threshold = np.mean(self.decision_function(
pairs[(y == 1).ravel()]))
dissimilar_threshold = np.mean(self.decision_function(
pairs[(y == -1).ravel()]))
self.threshold_ = np.mean([similar_threshold, dissimilar_threshold])


Expand Down Expand Up @@ -458,9 +460,14 @@ def score(self, quadruplets, y=None):
score : float
The quadruplets score.
"""
quadruplets = check_input(quadruplets, y, type_of_inputs='tuples',
preprocessor=self.preprocessor_,
estimator=self, tuple_size=self._tuple_size)
checked_input = check_input(quadruplets, y, type_of_inputs='tuples',
preprocessor=self.preprocessor_,
estimator=self, tuple_size=self._tuple_size)
# checked_input will be of the form `(checked_quadruplets, checked_y)` if
# `y` is not None, or just `checked_quadruplets` if `y` is None
quadruplets = checked_input if y is None else checked_input[0]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a bit ugly, but I couldn't find another way to do it. Maybe refactor check_input with two functions like check_X_y and check_array ? Or we can just keep it as it is for now

if y is None:
y = np.ones(quadruplets.shape[0])
else:
y = checked_input[1]
return accuracy_score(y, self.predict(quadruplets))
26 changes: 1 addition & 25 deletions test/test_quadruplets_classifiers.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,35 +25,11 @@ def test_predict_only_one_or_minus_one(estimator, build_dataset,
assert np.isin(predictions, [-1, 1]).all()


@pytest.mark.parametrize('with_preprocessor', [True, False])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this test because it makes no sense for quadruplets since the threshold should be always 0

@pytest.mark.parametrize('estimator, build_dataset', quadruplets_learners,
ids=ids_quadruplets_learners)
def test_predict_monotonous(estimator, build_dataset,
with_preprocessor):
"""Test that there is a threshold distance separating points labeled as
similar and points labeled as dissimilar """
input_data, labels, preprocessor, _ = build_dataset(with_preprocessor)
estimator = clone(estimator)
estimator.set_params(preprocessor=preprocessor)
set_random_state(estimator)
(quadruplets_train,
quadruplets_test, y_train, y_test) = train_test_split(input_data, labels)
estimator.fit(quadruplets_train, y_train)
distances = estimator.score_quadruplets(quadruplets_test)
predictions = estimator.predict(quadruplets_test)
min_dissimilar = np.min(distances[predictions == -1])
max_similar = np.max(distances[predictions == 1])
assert max_similar <= min_dissimilar
separator = np.mean([min_dissimilar, max_similar])
assert (predictions[distances > separator] == -1).all()
assert (predictions[distances < separator] == 1).all()


@pytest.mark.parametrize('with_preprocessor', [True, False])
@pytest.mark.parametrize('estimator, build_dataset', quadruplets_learners,
ids=ids_quadruplets_learners)
def test_raise_not_fitted_error_if_not_fitted(estimator, build_dataset,
with_preprocessor):
with_preprocessor):
"""Test that a NotFittedError is raised if someone tries to predict and
the metric learner has not been fitted."""
input_data, labels, preprocessor, _ = build_dataset(with_preprocessor)
Expand Down