Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2. [MRG] Bilinear similarity #329

Open
wants to merge 65 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 62 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
4b7cdec
Remove 3.9 from compatibility
mvargas33 Sep 15, 2021
0147c0c
FIrst draft of bilinear mixin
mvargas33 Sep 17, 2021
ec09f59
Fix score_pairs
mvargas33 Sep 17, 2021
ec49397
Two implementations for score_pairs
mvargas33 Sep 17, 2021
2f3c3e1
Generalized toy tests
mvargas33 Sep 17, 2021
c21d283
Handmade tests incorporated
mvargas33 Sep 17, 2021
dbe2a7a
Fix identation for bilinear
Sep 21, 2021
ee5c5ee
Add performance test to choose between two methods for bilinear calc
Sep 21, 2021
9a10e06
Found an efficient way to compute Bilinear Sim for n pairs
Sep 22, 2021
b1edc46
Update method's descriptions
Sep 22, 2021
ae562e6
Following the correct testing structure
Sep 22, 2021
7ebc026
Fix identation
Sep 22, 2021
1d752f7
Add more tests. Fix 4 to 2 identation
Sep 23, 2021
45c9b97
Minor flake8 fix
Sep 23, 2021
407f910
Commented each test
Sep 23, 2021
80c9085
All tests have been generalized
Sep 23, 2021
90ac550
Fix flake8 identation
Sep 23, 2021
68eeda9
Minor details
Sep 23, 2021
c47797c
Remove 3.9 from compatibility
mvargas33 Sep 15, 2021
e07b11a
First draft of refactoring BaseMetricLearner and Mahalanobis Learner
Oct 1, 2021
8210acd
Avoid warning related to score_pairs deprecation in tests of pair_cal…
Oct 6, 2021
11b5df6
Minor fix
Oct 6, 2021
06b7131
Replaced score_pairs with pair_distance in tests
Oct 6, 2021
d5cb8b4
Replace score_pairs with pair_distance inb docs.
Oct 6, 2021
2f61e7b
Fix weird commit
Oct 8, 2021
9dd38aa
Fix weird commit
Oct 8, 2021
5f68ed2
Update classifiers to use pair_similarity
Oct 8, 2021
3d6450b
Updated rst docs
Oct 8, 2021
7bce493
Fix identation
Oct 8, 2021
7e6584a
Update docs of score_pairs, get_metric
Oct 11, 2021
0b58f45
Add deprecation Test. Fix identation
Oct 11, 2021
d4d3a9c
Merge branch 'master' into score-deprecation
Oct 11, 2021
d27bdf5
Merge branch 'score-deprecation' into feat-bilinear
Oct 11, 2021
78a205c
Refactor to use pair_similarity instead of score_pairs
Oct 11, 2021
dde3576
Add more testing. Test refactor TBD
Oct 13, 2021
3020110
Tests are now parametrized
Oct 13, 2021
2746668
Add bilinear in introduction
Oct 15, 2021
920e504
Minor comment on use case
Oct 15, 2021
7a24319
More changes to sueprvised
Oct 15, 2021
2f8ee76
Changes in weakly Supervised
Oct 15, 2021
60c88a6
Merge remote-tracking branch 'upstream/master' into score-deprecation
Oct 19, 2021
8c55970
Fixed changes requested 1
Oct 19, 2021
787a8d1
Fixed changes requested 2
Oct 19, 2021
e14f956
Add equivalence test, p_dist == p_score
Oct 19, 2021
0941a32
Fix tests and identation.
Oct 19, 2021
b019d85
Fixed changes requested 3
Oct 20, 2021
74df897
Fix identation
Oct 20, 2021
c62a4e7
Last requested changes
Oct 21, 2021
526e4ba
Merge branch 'score-deprecation' into feat-bilinear
Oct 21, 2021
2199724
Replaced pair_similarity for paiir_score
Oct 21, 2021
249e0fe
Last small detail
Oct 21, 2021
80f31ba
Merge branch 'score-deprecation' into feat-bilinear
Oct 21, 2021
8df44a4
Merge remote-tracking branch 'upstream/master' into feat-bilinear
Oct 26, 2021
eef13bb
Classifiers only test classifiers methods now. + Standard doctrings now.
Oct 26, 2021
b952af0
Work in tests. More comments. Some refactors
Oct 26, 2021
7cc0d5e
Learner lists for M and B learners. Separated test by kind. Mock clas…
Oct 27, 2021
5f6bdc2
Moved mocks to test_utils.py, then refactor test_bilinear_mixin.py
Oct 27, 2021
100a05d
Merge branch 'master' into feat-bilinear
Nov 3, 2021
3bf5eae
Resolved observations in interoduction.rst
Nov 8, 2021
acfd54b
Resolved all observations for supervised.rst and weakly_s.rst
Nov 8, 2021
69bd9fe
Spellcheck
Nov 9, 2021
ade34cc
Moved common test to test_base_metric.py . Refactor preprocessor test…
Nov 9, 2021
7cfd432
Fix docs annotations
Nov 18, 2021
0ac9e7a
Second chunks of annotations.
Nov 18, 2021
98340b0
Merge branch 'master' into feat-bilinear
perimosocordiae Jun 21, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 42 additions & 14 deletions doc/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,16 @@
What is Metric Learning?
========================

Many approaches in machine learning require a measure of distance between data
points. Traditionally, practitioners would choose a standard distance metric
Many approaches in machine learning require a measure of distance (or similarity)
between data points. Traditionally, practitioners would choose a standard metric
(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of the
domain. However, it is often difficult to design metrics that are well-suited
to the particular data and task of interest.

Distance metric learning (or simply, metric learning) aims at
automatically constructing task-specific distance metrics from (weakly)
supervised data, in a machine learning manner. The learned distance metric can
then be used to perform various tasks (e.g., k-NN classification, clustering,
information retrieval).
Metric learning (or simply, metric learning) aims at automatically constructing
mvargas33 marked this conversation as resolved.
Show resolved Hide resolved
task-specific metrics from (weakly) supervised data, in a machine learning manner.
The learned metric can then be used to perform various tasks (e.g.,
k-NN classification, clustering, information retrieval).

Problem Setting
===============
Expand All @@ -25,27 +24,27 @@ of supervision available about the training data:
- :doc:`Supervised learning <supervised>`: the algorithm has access to
a set of data points, each of them belonging to a class (label) as in a
standard classification problem.
Broadly speaking, the goal in this setting is to learn a distance metric
Broadly speaking, the goal in this setting is to learn a metric
that puts points with the same label close together while pushing away
points with different labels.
- :doc:`Weakly supervised learning <weakly_supervised>`: the
algorithm has access to a set of data points with supervision only
at the tuple level (typically pairs, triplets, or quadruplets of
data points). A classic example of such weaker supervision is a set of
positive and negative pairs: in this case, the goal is to learn a distance
positive and negative pairs: in this case, the goal is to learn a
metric that puts positive pairs close together and negative pairs far away.

Based on the above (weakly) supervised data, the metric learning problem is
generally formulated as an optimization problem where one seeks to find the
parameters of a distance function that optimize some objective function
parameters of a function that optimize some objective function
mvargas33 marked this conversation as resolved.
Show resolved Hide resolved
measuring the agreement with the training data.

.. _mahalanobis_distances:

Mahalanobis Distances
=====================

In the metric-learn package, all algorithms currently implemented learn
In the metric-learn package, most algorithms currently implemented learn
so-called Mahalanobis distances. Given a real-valued parameter matrix
:math:`L` of shape ``(num_dims, n_features)`` where ``n_features`` is the
number features describing the data, the Mahalanobis distance associated with
Expand Down Expand Up @@ -79,6 +78,35 @@ necessarily the identity of indiscernibles.
parameterizations are equivalent. In practice, an algorithm may thus solve
the metric learning problem with respect to either :math:`M` or :math:`L`.

.. _bilinear_similarity:

Bilinear Similarities
=====================

Some algorithms in the package learn bilinear similarity functions. These
similarity functions are not pseudo-distances: they simply output real values
such that the larger the similarity value, the more similar the two examples.
Given a real-valued parameter matrix :math:`W` of shape
``(n_features, n_features)`` where ``n_features`` is the number features
describing the data, the bilinear similarity associated with :math:`W` is
defined as follows:

.. math:: S_W(x, x') = x^\top W x'

The matrix :math:`W` is not required to be positive semi-definite (PSD) or
even symmetric, so the distance properties (nonnegativity, identity of
indiscernibles, symmetry and triangle inequality) do not hold in general.

This allows some algorithms to optimize :math:`S_W` in an online manner using a
simple and efficient procedure, and thus can be applied to problems with
millions of training instances and achieves state-of-the-art performance
on an image search task using :math:`k`-NN.
mvargas33 marked this conversation as resolved.
Show resolved Hide resolved

The absence of PSD constraint can enable the design of more efficient
algorithms. It is also relevant in applications where the underlying notion
of similarity does not satisfy the triangle inequality, as known to be the
case for visual judgments.

.. _use_cases:

Use-cases
Expand All @@ -99,9 +127,9 @@ examples (for code illustrating some of these use-cases, see the
elements of a database that are semantically closest to a query element.
- Dimensionality reduction: metric learning may be seen as a way to reduce the
data dimension in a (weakly) supervised setting.
- More generally, the learned transformation :math:`L` can be used to project
the data into a new embedding space before feeding it into another machine
learning algorithm.
- More generally with Mahalanobis distances, the learned transformation :math:`L`
can be used to project the data into a new embedding space before feeding it
into another machine learning algorithm.

The API of metric-learn is compatible with `scikit-learn
<https://scikit-learn.org/>`_, the leading library for machine
Expand Down
93 changes: 58 additions & 35 deletions doc/supervised.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,70 +41,93 @@ two numbers.

Fit, transform, and so on
-------------------------
The goal of supervised metric-learning algorithms is to transform
points in a new space, in which the distance between two points from the
same class will be small, and the distance between two points from different
classes will be large. To do so, we fit the metric learner (example:
`NCA`).
The goal of supervised metric learning algorithms is to learn a (distance or
similarity) metric such that two points from the same class will be similar
(e.g., have small distance) and points from different classes will be dissimilar
(e.g., have large distance).

To do so, we first need to fit the supervised metric learner on a labeled dataset,
as in the example below with ``NCA``.

>>> from metric_learn import NCA
>>> nca = NCA(random_state=42)
>>> nca.fit(X, y)
NCA(init='auto', max_iter=100, n_components=None,
preprocessor=None, random_state=42, tol=None, verbose=False)


Now that the estimator is fitted, you can use it on new data for several
purposes.

First, you can transform the data in the learned space, using `transform`:
Here we transform two points in the new embedding space.
We can now use the learned metric to **score** new pairs of points with ``pair_score``
mvargas33 marked this conversation as resolved.
Show resolved Hide resolved
(the larger the score, the more similar the pair). For Mahalanobis learners,
it is equal to the opposite of the distance.

>>> X_new = np.array([[9.4, 4.1], [2.1, 4.4]])
>>> nca.transform(X_new)
array([[ 5.91884732, 10.25406973],
[ 3.1545886 , 6.80350083]])
>>> score = nca.pair_score([[[3.5, 3.6], [5.6, 2.4]], [[1.2, 4.2], [2.1, 6.4]], [[3.3, 7.8], [10.9, 0.1]]])
>>> score
array([-0.49627072, -3.65287282, -6.06079877])

Also, as explained before, our metric learners has learn a distance between
points. You can use this distance in two main ways:
This is useful because ``pair_score`` matches the **score** semantic of
scikit-learn's `Classification metrics
<https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics>`_.

- You can either return the distance between pairs of points using the
`pair_distance` function:
For metric learners that learn a distance metric, there is also the ``pair_distance``
method.

>>> nca.pair_distance([[[3.5, 3.6], [5.6, 2.4]], [[1.2, 4.2], [2.1, 6.4]], [[3.3, 7.8], [10.9, 0.1]]])
array([0.49627072, 3.65287282, 6.06079877])

- Or you can return a function that will return the distance (in the new
space) between two 1D arrays (the coordinates of the points in the original
space), similarly to distance functions in `scipy.spatial.distance`.
.. warning::

If you try to use ``pair_distance`` with a bilinear similarity learner, an error
will be thrown, as it does not learn a distance.

You can also return a function that will return the metric learned. It can
compute the metric between two 1D arrays, similarly to distance functions in
`scipy.spatial.distance`. To do that, use the ``get_metric`` method.

>>> metric_fun = nca.get_metric()
>>> metric_fun([3.5, 3.6], [5.6, 2.4])
0.4962707194621285

- Alternatively, you can use `pair_score` to return the **score** between
pairs of points (the larger the score, the more similar the pair).
For Mahalanobis learners, it is equal to the opposite of the distance.
You can also call ``get_metric`` with bilinear similarity learners, and you will get
a function that will return the similarity between 1D arrays.

>>> score = nca.pair_score([[[3.5, 3.6], [5.6, 2.4]], [[1.2, 4.2], [2.1, 6.4]], [[3.3, 7.8], [10.9, 0.1]]])
>>> score
array([-0.49627072, -3.65287282, -6.06079877])
>>> similarity_fun = algorithm.get_metric()
>>> similarity_fun([3.5, 3.6], [5.6, 2.4])
-0.04752

This is useful because `pair_score` matches the **score** semantic of
scikit-learn's `Classification metrics
<https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics>`_.
Finally, as explained in :ref:`mahalanobis_distances`, these are equivalent to the Euclidean
mvargas33 marked this conversation as resolved.
Show resolved Hide resolved
distance in a transformed space, and can thus be used to transform data points in
a new embedding space. You can use ``transform`` to do so.

>>> X_new = np.array([[9.4, 4.1], [2.1, 4.4]])
>>> nca.transform(X_new)
array([[ 5.91884732, 10.25406973],
[ 3.1545886 , 6.80350083]])

.. warning::

If you try to use ``transform`` with a bilinear similarity learner, an error will
be thrown, as you cannot transform the data using them.

.. note::

If the metric learner that you use learns a :ref:`Mahalanobis distance
<mahalanobis_distances>` (like it is the case for all algorithms
currently in metric-learn), you can get the plain learned Mahalanobis
matrix using `get_mahalanobis_matrix`.
<mahalanobis_distances>`, you can get the learned Mahalanobis
matrix :math:`M` using `get_mahalanobis_matrix`.

>>> nca.get_mahalanobis_matrix()
array([[0.43680409, 0.89169412],
[0.89169412, 1.9542479 ]])

If the metric learner that you use learns a :ref:`bilinear similarity
<bilinear_similarity>`, you can get the plain learned Bilinear
mvargas33 marked this conversation as resolved.
Show resolved Hide resolved
matrix :math:`W` using `get_bilinear_matrix`.

>>> algorithm.get_bilinear_matrix()
array([[-0.72680409, -0.153213],
[1.45542269, 7.8135546 ]])


Scikit-learn compatibility
--------------------------
Expand All @@ -116,7 +139,7 @@ All supervised algorithms are scikit-learn estimators
scikit-learn model selection routines
(`sklearn.model_selection.cross_val_score`,
`sklearn.model_selection.GridSearchCV`, etc).
You can also use some of the scoring functions from `sklearn.metrics`.
You can also use some scoring functions from `sklearn.metrics`.

Algorithms
==========
Expand Down Expand Up @@ -248,12 +271,12 @@ the sum of probability of being correctly classified:
Local Fisher Discriminant Analysis (:py:class:`LFDA <metric_learn.LFDA>`)

`LFDA` is a linear supervised dimensionality reduction method which effectively combines the ideas of `Linear Discriminant Analysis <https://en.wikipedia.org/wiki/Linear_discriminant_analysis>` and Locality-Preserving Projection . It is
particularly useful when dealing with multi-modality, where one ore more classes
particularly useful when dealing with multi-modality, where one or more classes
consist of separate clusters in input space. The core optimization problem of
LFDA is solved as a generalized eigenvalue problem.


The algorithm define the Fisher local within-/between-class scatter matrix
The algorithm defines the Fisher local within-/between-class scatter matrix
:math:`\mathbf{S}^{(w)}/ \mathbf{S}^{(b)}` in a pairwise fashion:

.. math::
Expand Down Expand Up @@ -408,7 +431,7 @@ method will look at all the samples from a different class and sample randomly
a pair among them. The method will try to build `num_constraints` positive
pairs and `num_constraints` negative pairs, but sometimes it cannot find enough
of one of those, so forcing `same_length=True` will return both times the
minimum of the two lenghts.
minimum of the two lengths.

For using quadruplets learners (see :ref:`learning_on_quadruplets`) in a
supervised way, positive and negative pairs are sampled as above and
Expand Down
Loading