Skip to content
This repository has been archived by the owner on Dec 13, 2024. It is now read-only.

Commit

Permalink
DOC: misc doc updates, add TODO
Browse files Browse the repository at this point in the history
  • Loading branch information
elcorto committed Oct 27, 2017
1 parent a04365c commit 438c60f
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 16 deletions.
41 changes: 26 additions & 15 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,15 @@ then used to cluster similar images.
Install
=======

::
.. code:: sh
$ git clone https://github.com/elcorto/imagecluster.git
$ cd imagecluster
$ pip3 install -e .
or::
or

.. code:: sh
$ python3 setup develop --prefix=...
Expand All @@ -28,9 +30,11 @@ See ``imagecluster.main.main()`` for a usage example.

If there is no fingerprints database, it will first run all images through the
NN model and calculate fingerprints. Then it will cluster the images based on
the fingerprints and a similarity index (more details below).
the fingerprints and a similarity index ``sim=0...1`` (more details below).

Example session:

Example session::
.. code:: python
>>> from imagecluster import main
>>> main.main('/path/to/testpics/', sim=0.5)
Expand All @@ -49,7 +53,9 @@ Example session::
5 : 1
10 : 1
Have a look at the clusters (as dirs with symlinks to the relevant files)::
Have a look at the clusters (as dirs with symlinks to the relevant files):

.. code:: sh
$ tree /path/to/testpics
/path/to/testpics/clusters
Expand Down Expand Up @@ -80,6 +86,20 @@ Have a look at the clusters (as dirs with symlinks to the relevant files)::
If you run this again on the same directory, only the clustering will be
repeated.

Similarity index
----------------

The index (0...1) defines the minimum required similarity that images must have
in order to be clustered together. A high index means to put only very similar
images in one cluster. The extreme case of similarity index 1.0 means to require
100% similarity and thus to put each image in a cluster of size 1 (unless there
are completely equal images). In contrast, low values imply low required
similarity. This results in less strict clustering which will put more but less
similar images in a cluster. A value of 0.0 (zero required similarity) is equal
to putting all images in one single cluster since all images are treated as
equal.


Methods
=======

Expand Down Expand Up @@ -139,17 +159,8 @@ Clustering
We use hierarchical clustering, see ``imagecluster.calc.cluster()``. The image
fingerprints (4096-dim vectors) are compared using a distance metric and
similar images are put together in a cluster. The threshold for what counts as
similar is defined by a similar index (again, see ``calc.cluster()``).
similar is defined by the similarity index (again, see ``calc.cluster()``).

The index (0...1) defines the minimum required similarity that images must have
in order to be clustered together. A high index means to put only very similar
images in one cluster. The extreme case of similarity index 1 means to require
100% similarity and thus to put each image in a cluster of size 1 (unless there
are completely equal images). In contrast, low values imply low required
similarity. This results in less strict clustering which will put more but less
similar images in a cluster. A value of 0 (zero required similarity) is equal
to putting all images in one single cluster since all images are treated as
equal.

Tests
=====
Expand Down
5 changes: 5 additions & 0 deletions TODO
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
* use logging instead of print

* Split imagecluster.make_links() into grouping clusters of same size (e.g.
group_clusters() and link creation (make_links()). Add test for
group_clusters().
2 changes: 1 addition & 1 deletion imagecluster/imagecluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

def get_model():
"""Keras Model of the VGG16 network, with the output layer set to the
pre-last fully connected layer 'fc2' of shape (4096,)."""
second-to-last fully connected layer 'fc2' of shape (4096,)."""
# base_model.summary():
# ....
# block5_conv4 (Conv2D) (None, 15, 15, 512) 2359808
Expand Down

0 comments on commit 438c60f

Please sign in to comment.