Skip to content

Commit

Permalink
[MRG+1] Threshold for pairs learners (#168)
Browse files Browse the repository at this point in the history
* add some tests for testing that different scores work using the scoring function

* ENH: Add tests and basic threshold implementation

* Add support for LSML and more generally quadruplets

* Make CalibratedClassifierCV work (for preprocessor case) thanks to classes_

* Fix some tests and PEP8 errors

* change the sign in decision function

* Add docstring for threshold_ and classes_ in the base _PairsClassifier class

* remove quadruplets from the test with scikit learn custom scorings

* Remove argument y in quadruplets learners and lsml

* FIX fix docstrings of decision functions

* FIX the threshold by taking the opposite (to be adapted to the decision function)

* Fix tests to have no y for quadruplets' estimator fit

* Remove isin to be compatible with old numpy versions

* Fix threshold so that it has a positive value and add small test

* Fix threshold for itml

* FEAT: Add calibrate_threshold and tests

* MAINT: remove starred syntax for compatibility with older versions of python

* Remove debugging prints and make tests for ITML pass, while waiting for #175 to be solved

* FIX: from __future__ import division to pass tests for python 2.7

* Add some documentation for calibration

* DOC: fix style

* Address most comments from aurelien's reviews

* Remove classes_ attribute and test for CalibratedClassifierCV

* Rename make_args_inc_quadruplets into remove_y_quadruplets

* TST: Fix remaining threshold into min_rate

* Remove default_threshold and put calibrate_threshold instead

* Use calibrate_threshold for ITML, and remove description

* ENH: use calibrate_threshold by default and display its parameters from the fit method

* Add a small test to test automatic calibration

* Update documentation of the default threshold

* Inverse sense for threshold comparison to be more intuitive

* Address remaining review comments

* MAINT: Rename threshold_params into calibration_params

* TST: Add test for extreme cases

* MAINT: rename threshold_params into calibration_params

* MAINT: rename threshold_params into calibration_params

* FIX: Make tests work, and add the right threshold (mean between lowest accepted value and highest rejected value), and max + 1 or min - 1 for extreme points

* Go back to previous version of finding the threshold

* Extract method for validating calibration parameters

* Validate calibration params before fit

* Address #168 (comment)
  • Loading branch information
wdevazelhes authored and bellet committed Apr 15, 2019
1 parent b28933c commit edad55d
Show file tree
Hide file tree
Showing 11 changed files with 1,066 additions and 148 deletions.
117 changes: 83 additions & 34 deletions doc/weakly_supervised.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,8 +148,47 @@ tuples you're working with (pairs, triplets...). See the docstring of the
`score` method of the estimator you use.


Learning on pairs
=================

Some metric learning algorithms learn on pairs of samples. In this case, one
should provide the algorithm with ``n_samples`` pairs of points, with a
corresponding target containing ``n_samples`` values being either +1 or -1.
These values indicate whether the given pairs are similar points or
dissimilar points.


.. _calibration:

Thresholding
------------
In order to predict whether a new pair represents similar or dissimilar
samples, we need to set a distance threshold, so that points closer (in the
learned space) than this threshold are predicted as similar, and points further
away are predicted as dissimilar. Several methods are possible for this
thresholding.

- **At fit time**: The threshold is set with `calibrate_threshold` (see
below) on the trainset. You can specify the calibration parameters directly
in the `fit` method with the `threshold_params` parameter (see the
documentation of the `fit` method of any metric learner that learns on pairs
of points for more information). This method can cause a little bit of
overfitting. If you want to avoid that, calibrate the threshold after
fitting, on a validation set.

- **Manual**: calling `set_threshold` will set the threshold to a
particular value.

- **Calibration**: calling `calibrate_threshold` will calibrate the
threshold to achieve a particular score on a validation set, the score
being among the classical scores for classification (accuracy, f1 score...).


See also: `sklearn.calibration`.


Algorithms
==================
==========

ITML
----
Expand Down Expand Up @@ -192,39 +231,6 @@ programming.
.. [2] Adapted from Matlab code at http://www.cs.utexas.edu/users/pjain/
itml/
LSML
----

`LSML`: Metric Learning from Relative Comparisons by Minimizing Squared
Residual

.. topic:: Example Code:

::

from metric_learn import LSML

quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]],
[[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]],
[[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]],
[[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]]

# we want to make closer points where the first feature is close, and
# further if the second feature is close

lsml = LSML()
lsml.fit(quadruplets)

.. topic:: References:

.. [1] Liu et al.
"Metric Learning from Relative Comparisons by Minimizing Squared
Residual". ICDM 2012. http://www.cs.ucla.edu/~weiwang/paper/ICDM12.pdf
.. [2] Adapted from https://gist.github.com/kcarnold/5439917
SDML
----

Expand Down Expand Up @@ -343,3 +349,46 @@ method. However, it is one of the earliest and a still often cited technique.
-with-side-information.pdf>`_ Xing, Jordan, Russell, Ng.
.. [2] Adapted from Matlab code `here <http://www.cs.cmu
.edu/%7Eepxing/papers/Old_papers/code_Metric_online.tar.gz>`_.
Learning on quadruplets
=======================

A type of information even weaker than pairs is information about relative
comparisons between pairs. The user should provide the algorithm with a
quadruplet of points, where the two first points are closer than the two
last points. No target vector (``y``) is needed, since the supervision is
already in the order that points are given in the quadruplet.

Algorithms
==========

LSML
----

`LSML`: Metric Learning from Relative Comparisons by Minimizing Squared
Residual

.. topic:: Example Code:

::

from metric_learn import LSML

quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]],
[[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]],
[[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]],
[[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]]

# we want to make closer points where the first feature is close, and
# further if the second feature is close

lsml = LSML()
lsml.fit(quadruplets)

.. topic:: References:

.. [1] Liu et al.
"Metric Learning from Relative Comparisons by Minimizing Squared
Residual". ICDM 2012. http://www.cs.ucla.edu/~weiwang/paper/ICDM12.pdf
.. [2] Adapted from https://gist.github.com/kcarnold/5439917
Loading

0 comments on commit edad55d

Please sign in to comment.