diff --git a/README.rst b/README.rst
index ee8c9f2..c29b01c 100644
--- a/README.rst
+++ b/README.rst
@@ -2,83 +2,23 @@ About
 =====
 
 Package for clustering images by content. We use a pre-trained deep
-convolutional neural network to calculate image fingerprints, which are then
-used to cluster similar images.
+convolutional neural network to calculate image fingerprints which represent
+content. Those are used to cluster similar images. In addition to pure
+image content, it is possible to mix in timestamp information which improves
+clustering for temporally uncorrelated images.
 
 Usage
 =====
 
-The package is designed as a library. Here is what you can do:
-
-.. code:: python
-
-    from imagecluster import calc as ic
-    from imagecluster import postproc as pp
-
-    # Create image database in memory. This helps to feed images to the NN model
-    # quickly.
-    ias = ic.image_arrays('pics/', size=(224,224))
-
-    # Create Keras NN model.
-    model = ic.get_model()
-
-    # Feed images through the model and extract fingerprints (feature vectors).
-    fps = ic.fingerprints(ias, model)
-
-    # Optionally run a PCA on the fingerprints to compress the dimensions. Use a
-    # cumulative explained variance ratio of 0.95.
-    fps = ic.pca(fps, n_components=0.95)
-
-    # Run clustering on the fingerprints.  Select clusters with similarity index
-    # sim=0.5
-    clusters = ic.cluster(fps, sim=0.5)
-
-    # Create dirs with links to images. Dirs represent the clusters the images
-    # belong to.
-    pp.make_links(clusters, 'pics/imagecluster/clusters')
-
-    # Plot images arranged in clusters.
-    pp.visualize(clusters, ias)
-
-See also ``imagecluster.main.main()``. It does the same as the code above, but
-also saves/loads the image database and the fingerprints to/from disk, such
-that you can re-run the clustering and post-processing again without
-re-calculating fingerprints.
-
-Example session:
-
-.. code:: python
-
-    >>> from imagecluster import main
-    >>> main.main('pics/', sim=0.5, vis=True)
-    no fingerprints database pics/imagecluster/fingerprints.pk found
-    create image array database pics/imagecluster/images.pk
-    pics/140301.jpg
-    pics/140601.jpg
-    pics/140101.jpg
-    pics/140400.jpg
-    pics/140801.jpg
-    [...]
-    running all images through NN model ...
-    pics/140301.jpg
-    pics/140503.jpg
-    pics/140601.jpg
-    pics/140901.jpg
-    pics/140101.jpg
-    [...]
-    clustering ...
-    #images : #clusters
-    2 : 7
-    3 : 1
-    #images in clusters total:  17
-    cluster dir: pics/imagecluster/clusters
-
-If you run this again on the same directory, only the clustering (which is very
-fast) and the post-processing (links, visualization) will be repeated.
+The package is designed as a library. See ``examples/example_api.py``.
 
-For this example, we use a very small subset of the `Holiday image dataset
-<holiday_>`_ (25 images (all named 140*.jpg) of 1491 total images in the
-dataset).
+.. Here is what you can do:
+
+.. .. code:: python
+.. example_api.py
+
+The bottleneck is ``~imagecluster.calc.fingerprints``, all other
+operations have negligible relative cost.
 
 Have a look at the clusters (as dirs with symlinks to the relevant files):
 
@@ -119,7 +59,16 @@ at the clusters:
 
 .. image:: doc/clusters.png
 
-Here is the result of using a larger subset of 292 images from the same dataset.
+For this example, we use a very small subset of the `Holiday image dataset
+<holiday_>`_ (25 images (all named 140*.jpg) of 1491 total images in the
+dataset). See ``examples/inria_holiday.sh`` for how to select such a subset:
+
+.. code:: sh
+
+    $ /path/to/imagecluster/examples/inria_holiday.sh jpg/140*
+
+Here is the result of using a larger subset of 292 images from the same dataset
+(``/inria_holiday.sh jpg/14*``):
 
 .. image:: doc/clusters_many.png
 
@@ -136,8 +85,6 @@ can be grouped together depending on their similarity (y-axis).
 
 .. image:: doc/dendrogram.png
 
-
-
 One can now cut through the dendrogram tree at a certain height (``sim``
 parameter 0...1, y-axis) to create clusters of images with that level of
 similarity. ``sim=0`` is the root of the dendrogram (top in the plot) where
@@ -164,6 +111,28 @@ use (`thanks for the hint! <alexcnwy_>`_) the activations of the second to last
 fully connected layer ('fc2', 4096 nodes) as image fingerprints (numpy 1d array
 of shape ``(4096,)``) by default.
 
+Content and time distance
+-------------------------
+
+Image fingerprints represent content. Clustering based on content ignores time
+correlations. Say we have two images of some object that look similar and will
+thus be put into the same cluster. However, they might be in fact pictures of
+different objects, taken at different times -- which is our original holiday
+image use case (e.g. two images of a church from different cities, taken on
+separate trips). In this case, we want the images to end up in different
+clusters. We have a feature to mix content distance (``d_c`` and time distance
+``d_t``) such that
+
+::
+
+    d = (1 - alpha) * d_c * ahpha * d_t
+
+One can thus do pure content-based clustering (``alpha=0``) or pure time-based
+(``alpha=1``). The effect of the mixing is that fingerprint points representing
+content get pushed further apart when the corresponding images' time distance
+is large. That way, we achieve a transparent addition of time information w/o
+changing the clustering method.
+
 
 Quality of clustering & parameters to tune
 ------------------------------------------
@@ -211,13 +180,7 @@ Install
 
     $ pip3 install -e .
 
-or if you have the ``requirements.txt`` already installed (e.g. by your system's
-package manager)
-
-.. code:: sh
-
-    $ pip3 install -e . --no-deps
-
+See also samplepkg_.
 
 Contributions
 =============
@@ -246,3 +209,4 @@ Related projects
 .. _curse: https://en.wikipedia.org/wiki/Curse_of_dimensionality
 .. _gh_beleidy: https://github.com/beleidy/unsupervised-image-clustering
 .. _commit_pfx: https://github.com/elcorto/libstuff/blob/master/commit_prefixes
+.. _samplepkg: https://github.com/elcorto/samplepkg
diff --git a/examples/example_api.py b/examples/example_api.py
old mode 100644
new mode 100755
index babd9da..41a8792
--- a/examples/example_api.py
+++ b/examples/example_api.py
@@ -1,27 +1,42 @@
+#!/usr/bin/python3
+
 from imagecluster import calc as ic
+from imagecluster import io as icio
 from imagecluster import postproc as pp
 
-# Create image database in memory. This helps to feed images to the NN model
-# quickly.
-ias = ic.image_arrays('pics/', size=(224,224))
-
-# Create Keras NN model.
-model = ic.get_model()
-
-# Feed images through the model and extract fingerprints (feature vectors).
-fps = ic.fingerprints(ias, model)
+# # Create image database in memory. This helps to feed images to the NN model
+# # quickly.
+# image_arrays = icio.read_image_arrays('pics/', size=(224,224))
+#
+# # Create Keras NN model.
+# model = ic.get_model()
+#
+# # Feed images through the model and extract fingerprints (feature vectors).
+# fingerprints = ic.fingerprints(image_arrays, model)
+#
+# # Optionally run a PCA on the fingerprints to compress the dimensions. Use a
+# # cumulative explained variance ratio of 0.95.
+# fingerprints = ic.pca(fingerprints, n_components=0.95)
+#
+# # Read image timestamps. Need that to calculate the time distance, can be used
+# # in clustering.
+# timestamps = icio.read_timestamps('pics/')
 
-# Optionally run a PCA on the fingerprints to compress the dimensions. Use a
-# cumulative explained variance ratio of 0.95.
-fps = ic.pca(fps, n_components=0.95)
+# XXX where on disk? add to README
+# Convenience function to perform the steps above. Check for existing
+# `image_arrays` and `fingerprints` database files on disk and load them if
+# present. Running this again only loads data from disk, which is fast.
+image_arrays,fingerprints,timestamps = icio.get_image_data(
+    'pics/',
+    pca_kwds=dict(n_components=0.95))
 
-# Run clustering on the fingerprints.  Select clusters with similarity index
-# sim=0.5
-clusters = ic.cluster(fps, sim=0.5)
+# Run clustering on the fingerprints. Select clusters with similarity index
+# sim=0.5. Mix 80% content distance with 20% timestamp distance (alpha=0.2).
+clusters = ic.cluster(fingerprints, sim=0.5, timestamps=timestamps, alpha=0.2)
 
 # Create dirs with links to images. Dirs represent the clusters the images
 # belong to.
 pp.make_links(clusters, 'pics/imagecluster/clusters')
 
 # Plot images arranged in clusters.
-pp.visualize(clusters, ias)
+pp.visualize(clusters, image_arrays)
diff --git a/examples/example_main.py b/examples/example_main.py
deleted file mode 100644
index 0f1ec6d..0000000
--- a/examples/example_main.py
+++ /dev/null
@@ -1,3 +0,0 @@
-from imagecluster import main
-
-main.main('pics/', sim=0.65, vis=True, max_csize=10, pca=True)
diff --git a/examples/inria_holiday.sh b/examples/inria_holiday.sh
new file mode 100755
index 0000000..e369d00
--- /dev/null
+++ b/examples/inria_holiday.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+# select 25 images
+#   ./this.sh jpg/100*
+#
+# select 274 images
+#   ./this.sh jpg/10*
+
+if ! [ -d jpg ]; then
+    for name in jpg1 jpg2; do
+        wget ftp://ftp.inrialpes.fr/pub/lear/douze/data/${name}.tar.gz
+        tar -xzf ${name}.tar.gz
+    done
+fi
+
+mkdir -p pics
+rm -rf pics/*
+for x in $@; do
+    f=$(echo "$x" | sed -re 's|jpg/||')
+    ln -s $(readlink -f jpg/$f) pics/$f
+done
+
+echo "#images: $(ls pics | wc -l)"
diff --git a/examples/plot_dendrogram.py b/examples/plot_dendrogram.py
old mode 100644
new mode 100755
index 220367f..256d224
--- a/examples/plot_dendrogram.py
+++ b/examples/plot_dendrogram.py
@@ -1,19 +1,20 @@
+#!/usr/bin/python3
+
 from matplotlib import pyplot as plt
 import numpy as np
 from scipy.cluster.hierarchy import dendrogram
 
 from imagecluster import calc as ic
+from imagecluster import io as icio
 
-ias = ic.image_arrays('pics/', size=(224,224))
+image_arrays = icio.read_image_arrays('pics/', size=(224,224))
 model = ic.get_model()
-fps = ic.fingerprints(ias, model)
-clusters,extra = ic.cluster(fps, sim=0.5, extra_out=True)
+fingerprints = ic.fingerprints(image_arrays, model)
+clusters,extra = ic.cluster(fingerprints, sim=0.5, extra_out=True)
 
 # linkage matrix Z
-Z = extra['Z']
-
 fig,ax = plt.subplots()
-dendrogram(Z, ax=ax)
+dendrogram(extra['Z'], ax=ax)
 
 # Adjust yaxis labels (values from Z[:,2]) to our definition of the `sim`
 # parameter.
diff --git a/imagecluster/calc.py b/imagecluster/calc.py
index c7c90a6..dabd9f2 100644
--- a/imagecluster/calc.py
+++ b/imagecluster/calc.py
@@ -1,26 +1,18 @@
 import os
-
-import multiprocessing as mp
-import functools
 from collections import OrderedDict
 
-from PIL import Image
+import numpy as np
 from scipy.spatial import distance
 from scipy.cluster import hierarchy
-import numpy as np
 from sklearn.decomposition import PCA
 
 from keras.applications.vgg16 import VGG16, preprocess_input
-from keras.preprocessing import image
 from keras.models import Model
 
-from . import common
-
 
 pj = os.path.join
 
 
-
 def get_model(layer='fc2'):
     """Keras Model of the VGG16 network, with the output layer set to `layer`.
 
@@ -53,64 +45,6 @@ def get_model(layer='fc2'):
     return model
 
 
-def load_img_rgb(fn):
-    return Image.open(fn).convert('RGB')
-
-
-# keras.preprocessing.image.load_img() uses img.rezize(shape) with the default
-# interpolation of Image.resize() which is pretty bad (see
-# imagecluster/play/pil_resample_methods.py). Given that we are restricted to
-# small inputs of 224x224 by the VGG network, we should do our best to keep as
-# much information from the original image as possible. This is a gut feeling,
-# untested. But given that model.predict() is 10x slower than PIL image loading
-# and resizing .. who cares.
-#
-# (224, 224, 3)
-##img = image.load_img(fn, target_size=size)
-##... = image.img_to_array(img)
-def _img_worker(fn, size):
-    # Handle PIL error "OSError: broken data stream when reading image file".
-    # See https://github.com/python-pillow/Pillow/issues/1510 . We have this
-    # issue with smartphone panorama JPG files. But instead of bluntly setting
-    # ImageFile.LOAD_TRUNCATED_IMAGES = True and hoping for the best (is the
-    # image read, and till the end?), we catch the OSError thrown by PIL and
-    # ignore the file completely. This is better than reading potentially
-    # undefined data and process it. A more specialized exception from PILs
-    # side would be good, but let's hope that an OSError doesn't cover too much
-    # ground when reading data from disk :-)
-    try:
-        print(fn)
-        return fn, image.img_to_array(load_img_rgb(fn).resize(size, 3),
-                                      dtype=int)
-    except OSError as ex:
-        print(f"skipping {fn}: {ex}")
-        return fn, None
-
-
-def image_arrays(imagedir, size, ncores=mp.cpu_count()):
-    """Load images from `imagedir` and resize to `size`.
-
-    Parameters
-    ----------
-    imagedir : str
-    size : sequence length 2
-        (width, height), used in ``Image.open(filename).resize(size)``
-    ncores : int
-        run that many parallel processes
-
-    Returns
-    -------
-    dict
-        {filename: 3d array (height, width, 3),
-         ...
-        }
-    """
-    _f = functools.partial(_img_worker, size=size)
-    with mp.Pool(ncores) as pool:
-        ret = pool.map(_f, common.get_files(imagedir))
-    return {k: v for k,v in ret if v is not None}
-
-
 def fingerprint(img_arr, model):
     """Run image array (3d array) run through `model` (keras.models.Model).
 
@@ -165,18 +99,18 @@ def fingerprint(img_arr, model):
 ##    return fn, fingerprint(img_arr, model)
 ##
 ##
-##def fingerprints(ias, model):
+##def fingerprints(image_arrays, model):
 ##    _f = functools.partial(_worker, model=model)
 ##    with mp.Pool(int(mp.cpu_count()/2)) as pool:
-##        ret = pool.map(_f, ias.items())
+##        ret = pool.map(_f, image_arrays.items())
 ##    return dict(ret)
 
-def fingerprints(ias, model):
-    """Calculate fingerprints for all image arrays in `ias`.
+def fingerprints(image_arrays, model):
+    """Calculate fingerprints for all image arrays in `image_arrays`.
 
     Parameters
     ----------
-    ias : see :func:`image_arrays`
+    image_arrays : see :func:`~io.image_arrays`
     model : see :func:`fingerprint`
 
     Returns
@@ -187,33 +121,38 @@ def fingerprints(ias, model):
          ...
          }
     """
-    fps = {}
-    for fn,img_arr in ias.items():
+    fingerprints = {}
+    for fn,img_arr in image_arrays.items():
         print(fn)
-        fps[fn] = fingerprint(img_arr, model)
-    return fps
+        fingerprints[fn] = fingerprint(img_arr, model)
+    return fingerprints
 
 
-def pca(fps, n_components=0.9, **kwds):
+def pca(fingerprints, n_components=0.9, **kwds):
     if 'n_components' not in kwds.keys():
         kwds['n_components'] = n_components
     # Yes in recent Pythons, dicts are ordered in CPython, but still.
-    _fps = OrderedDict(fps)
-    X = np.array(list(_fps.values()))
+    _fingerprints = OrderedDict(fingerprints)
+    X = np.array(list(_fingerprints.values()))
     Xp = PCA(**kwds).fit(X).transform(X)
-    return {k:v for k,v in zip(_fps.keys(), Xp)}
+    return {k:v for k,v in zip(_fingerprints.keys(), Xp)}
 
 
-def cluster(fps, sim=0.5, method='average', metric='euclidean',
-            extra_out=False, print_stats=True, min_csize=2):
+def cluster(fingerprints, sim=0.5, timestamps=None, alpha=0.3, method='average',
+            metric='euclidean', extra_out=False, print_stats=True, min_csize=2):
     """Hierarchical clustering of images based on image fingerprints.
 
     Parameters
     ----------
-    fps: dict
+    fingerprints: dict
         output of :func:`fingerprints`
     sim : float 0..1
         similarity index
+    timestamps: dict
+        output of :func:`~imagecluster.io.load_timestamps`
+    alpha : float
+        mixing parameter of image content distance and time distance, ignored
+        if timestamps is None
     method : see scipy.hierarchy.linkage(), all except 'centroid' produce
         pretty much the same result
     metric : see scipy.hierarchy.linkage(), make sure to use 'euclidean' in
@@ -240,14 +179,30 @@ def cluster(fps, sim=0.5, method='average', metric='euclidean',
         if `extra_out` is True
     """
     assert 0 <= sim <= 1, "sim not 0..1"
+    assert 0 <= alpha <= 1, "alpha not 0..1"
     assert min_csize >= 1, "min_csize must be >= 1"
-    files = list(fps.keys())
+    files = list(fingerprints.keys())
     # array(list(...)): 2d array
     #   [[... fingerprint of image1 (4096,) ...],
     #    [... fingerprint of image2 (4096,) ...],
     #    ...
     #    ]
-    dfps = distance.pdist(np.array(list(fps.values())), metric)
+    dfps = distance.pdist(np.array(list(fingerprints.values())), metric)
+    if timestamps is not None:
+        # Sanity error check as long as we don't have a single data struct to
+        # keep fingerprints and timestamps, as well as image data. This is not
+        # pretty, but at least a safety hook.
+        set_files = set(files)
+        set_tsfiles = set(timestamps.keys())
+        set_diff = set_files.symmetric_difference(set_tsfiles)
+        assert len(set_diff) == 0, (f"files in fingerprints and timestamps do "
+                                    f"not match: diff={set_diff}")
+        # use 'files' to make sure we have the same order as in 'fingerprints'
+        tsarr = np.array([timestamps[k] for k in files])[:,None]
+        dts = distance.pdist(tsarr, metric)
+        dts = dts / dts.max()
+        dfps = dfps / dfps.max()
+        dfps = dfps * (1 - alpha) + dts * alpha
     # hierarchical/agglomerative clustering (Z = linkage matrix, construct
     # dendrogram), plot: scipy.cluster.hierarchy.dendrogram(Z)
     Z = hierarchy.linkage(dfps, method=method, metric=metric)
diff --git a/imagecluster/common.py b/imagecluster/common.py
deleted file mode 100644
index ea9585c..0000000
--- a/imagecluster/common.py
+++ /dev/null
@@ -1,19 +0,0 @@
-import re
-import pickle
-import os
-
-
-def read_pk(fn):
-    with open(fn, 'rb') as fd:
-        ret = pickle.load(fd)
-    return ret
-
-
-def write_pk(obj, fn):
-    with open(fn, 'wb') as fd:
-        pickle.dump(obj, fd)
-
-
-def get_files(dr, ext='jpg|jpeg|bmp|png'):
-    rex = re.compile(r'^.*\.({})$'.format(ext), re.I)
-    return [os.path.join(dr,base) for base in os.listdir(dr) if rex.match(base)]
diff --git a/imagecluster/exceptions.py b/imagecluster/exceptions.py
new file mode 100644
index 0000000..a484ddf
--- /dev/null
+++ b/imagecluster/exceptions.py
@@ -0,0 +1,6 @@
+class ICError(Exception):
+    pass
+
+
+class ICExifReadError(ICError):
+    pass
diff --git a/imagecluster/io.py b/imagecluster/io.py
new file mode 100644
index 0000000..7e754d5
--- /dev/null
+++ b/imagecluster/io.py
@@ -0,0 +1,178 @@
+import datetime
+import functools
+import multiprocessing as mp
+import os
+import pickle
+import re
+
+from keras.preprocessing import image
+import PIL.Image
+
+from . import exceptions
+from . import calc as ic
+
+pj = os.path.join
+
+ic_base_dir = 'imagecluster'
+
+
+def read_pk(filename):
+    with open(filename, 'rb') as fd:
+        ret = pickle.load(fd)
+    return ret
+
+
+def write_pk(obj, filename):
+    os.makedirs(os.path.dirname(filename), exist_ok=True)
+    with open(filename, 'wb') as fd:
+        pickle.dump(obj, fd)
+
+
+def get_files(dr, ext='jpg|jpeg|bmp|png'):
+    rex = re.compile(r'^.*\.({})$'.format(ext), re.I)
+    return [os.path.join(dr,base) for base in os.listdir(dr) if rex.match(base)]
+
+
+def exif_timestamp(filename):
+    # PIL lazy-loads the image data, so this open and _getexif() is fast.
+    img = PIL.Image.open(filename)
+    if ('exif' not in img.info.keys()) or (not hasattr(img, '_getexif')):
+        raise exceptions.ICExifReadError(f"no EXIF data found in {filename}")
+    # Avoid constucting the whole EXIF dict just to extract the DateTime field.
+    # DateTime -> key 306 is in the EXIF standard, so let's use that directly.
+    ## date_time = {TAGS[k] : v for k,v in exif.items()}['DateTime']
+    exif = img._getexif()
+    key = 306
+    if key not in exif.keys():
+        raise exceptions.ICExifReadError(f"key 306 (DateTime) not found in "
+                                         f"EXIF data of file {filename}")
+    # '2019:03:10 22:42:42'
+    date_time = exif[key]
+    if date_time.count(':') != 4:
+        msg = f"unsupported EXIF DateTime format in '{date_time}' of {filename}"
+        raise exceptions.ICExifReadError(msg)
+    # '2019:03:10 22:42:42' -> ['2019', '03', '10', '22', '42', '42']
+    date_time_str = date_time.replace(':', ' ').split()
+    names = ('year', 'month', 'day', 'hour', 'minute', 'second')
+    stamp = datetime.datetime(**{nn:int(vv) for nn,vv in zip(names,date_time_str)},
+                              tzinfo=datetime.timezone.utc).timestamp()
+    return stamp
+
+
+def stat_timestamp(filename):
+    return os.stat(filename).st_mtime
+
+
+def timestamp(filename, source='auto'):
+    if source == 'auto':
+        try:
+            return exif_timestamp(filename)
+        except exceptions.ICExifReadError:
+            return stat_timestamp(filename)
+    elif source == 'stat':
+        return stat_timestamp(filename)
+    elif source == 'exif':
+        return exif_timestamp(filename)
+    else:
+        raise ValueError("source not in ['stat', 'exif', 'auto']")
+
+
+# TODO some code dups below, fix later by fancy factory functions
+
+# keras.preprocessing.image.load_img() uses img.rezize(shape) with the default
+# interpolation of Image.resize() which is pretty bad (see
+# imagecluster/play/pil_resample_methods.py). Given that we are restricted to
+# small inputs of 224x224 by the VGG network, we should do our best to keep as
+# much information from the original image as possible. This is a gut feeling,
+# untested. But given that model.predict() is 10x slower than PIL image loading
+# and resizing .. who cares.
+#
+# (224, 224, 3)
+##img = image.load_img(filename, target_size=size)
+##... = image.img_to_array(img)
+def _img_arr_worker(filename, size):
+    # Handle PIL error "OSError: broken data stream when reading image file".
+    # See https://github.com/python-pillow/Pillow/issues/1510 . We have this
+    # issue with smartphone panorama JPG files. But instead of bluntly setting
+    # ImageFile.LOAD_TRUNCATED_IMAGES = True and hoping for the best (is the
+    # image read, and till the end?), we catch the OSError thrown by PIL and
+    # ignore the file completely. This is better than reading potentially
+    # undefined data and process it. A more specialized exception from PILs
+    # side would be good, but let's hope that an OSError doesn't cover too much
+    # ground when reading data from disk :-)
+    try:
+        print(filename)
+        img = PIL.Image.open(filename).convert('RGB').resize(size, resample=3)
+        arr = image.img_to_array(img, dtype=int)
+        return filename, arr
+    except OSError as ex:
+        print(f"skipping {filename}: {ex}")
+        return filename, None
+
+
+def _timestamp_worker(filename, source):
+    try:
+        return filename, timestamp(filename, source)
+    except OSError as ex:
+        print(f"skipping {filename}: {ex}")
+        return filename, None
+
+
+def read_image_arrays(imagedir, size, ncores=mp.cpu_count()):
+    """Load images from `imagedir` and resize to `size`.
+
+    Parameters
+    ----------
+    imagedir : str
+    size : sequence length 2
+        (width, height), used in ``Image.open(filename).resize(size)``
+    ncores : int
+        run that many parallel processes
+
+    Returns
+    -------
+    dict
+        {filename: 3d array (height, width, 3), ...}
+    """
+    _f = functools.partial(_img_arr_worker, size=size)
+    with mp.Pool(ncores) as pool:
+        ret = pool.map(_f, get_files(imagedir))
+    return {k: v for k,v in ret if v is not None}
+
+
+def read_timestamps(imagedir, source='auto', ncores=mp.cpu_count()):
+    _f = functools.partial(_timestamp_worker, source=source)
+    with mp.Pool(ncores) as pool:
+        ret = pool.map(_f, get_files(imagedir))
+    return {k: v for k,v in ret if v is not None}
+
+
+# TODO fingerprints and timestamps may have different images which have been
+# skipped -> we need a data struct to hold all image data and mask out the
+# skipped ones. For now we have a check in calc.cluster()
+def get_image_data(imagedir, model_kwds=dict(layer='fc2'),
+                   img_kwds=dict(size=(224,224)), timestamps_kwds=dict(source='auto'),
+                   pca_kwds=None):
+    """Return all image data needed for clustering."""
+    fingerprints_fn = pj(imagedir, ic_base_dir, 'fingerprints.pk')
+    image_arrays_fn = pj(imagedir, ic_base_dir, 'images.pk')
+    if os.path.exists(image_arrays_fn):
+        print(f"reading image arrays {image_arrays_fn} ...")
+        image_arrays = read_pk(image_arrays_fn)
+    else:
+        print(f"create image arrays {image_arrays_fn}")
+        image_arrays = read_image_arrays(imagedir, **img_kwds)
+        write_pk(image_arrays, image_arrays_fn)
+    if os.path.exists(fingerprints_fn):
+        print(f"reading fingerprints {fingerprints_fn} ...")
+        fingerprints = read_pk(fingerprints_fn)
+    else:
+        print(f"create fingerprints {fingerprints_fn}")
+        fingerprints = ic.fingerprints(image_arrays, ic.get_model(**model_kwds))
+        if pca_kwds is not None:
+            fingerprints = ic.pca(fingerprints, **pca_kwds)
+        write_pk(fingerprints, fingerprints_fn)
+    print(f"reading timestamps ...")
+    if timestamps_kwds is not None:
+        timestamps = read_timestamps(imagedir, **timestamps_kwds)
+    return image_arrays, fingerprints, timestamps
diff --git a/imagecluster/main.py b/imagecluster/main.py
deleted file mode 100644
index 85c0818..0000000
--- a/imagecluster/main.py
+++ /dev/null
@@ -1,84 +0,0 @@
-import os
-
-from imagecluster import calc as ic
-from imagecluster import common as co
-from imagecluster import postproc as pp
-
-pj = os.path.join
-
-
-ic_base_dir = 'imagecluster'
-
-
-def main(imagedir, sim=0.5, layer='fc2', size=(224,224), links=True, vis=False,
-         max_csize=None, pca=False, pca_params=dict(n_components=0.9)):
-    """Example main app using this library.
-
-    Upon first invocation, the image and fingerprint databases are built and
-    written to disk. Each new invocation loads those and only repeats
-        * clustering
-        * creation of links to files in clusters
-        * visualization (if `vis=True`)
-
-    This is good for playing around with the `sim` parameter, for
-    instance, which only influences clustering.
-
-    Parameters
-    ----------
-    imagedir : str
-        path to directory with images
-    sim : float (0..1)
-        similarity index (see :func:`calc.cluster`)
-    layer : str
-        which layer to use as feature vector (see
-        :func:`calc.get_model`)
-    size : tuple
-        input image size (width, height), must match `model`, e.g. (224,224)
-    links : bool
-        create dirs with links
-    vis : bool
-        plot images in clusters
-    max_csize : max number of images per cluster for visualization (see
-        :mod:`~postproc`)
-    pca : bool
-        Perform PCA on fingerprints before clustering, using `pca_params`.
-    pca_params : dict
-        kwargs to sklearn's PCA
-
-    Notes
-    -----
-    imagedir : To select only a subset of the images, create an `imagedir` and
-        symlink your selected images there. In the future, we may add support
-        for passing a list of files, should the need arise. But then again,
-        this function is only an example front-end.
-    """
-    fps_fn = pj(imagedir, ic_base_dir, 'fingerprints.pk')
-    ias_fn = pj(imagedir, ic_base_dir, 'images.pk')
-    ias = None
-    if not os.path.exists(fps_fn):
-        print(f"no fingerprints database {fps_fn} found")
-        os.makedirs(os.path.dirname(fps_fn), exist_ok=True)
-        model = ic.get_model(layer=layer)
-        if not os.path.exists(ias_fn):
-            print(f"create image array database {ias_fn}")
-            ias = ic.image_arrays(imagedir, size=size)
-            co.write_pk(ias, ias_fn)
-        else:
-            ias = co.read_pk(ias_fn)
-        print("running all images through NN model ...")
-        fps = ic.fingerprints(ias, model)
-        co.write_pk(fps, fps_fn)
-    else:
-        print(f"loading fingerprints database {fps_fn} ...")
-        fps = co.read_pk(fps_fn)
-    if pca:
-        fps = ic.pca(fps, **pca_params)
-        print("pca dims:", list(fps.values())[0].shape[0])
-    print("clustering ...")
-    clusters = ic.cluster(fps, sim)
-    if links:
-        pp.make_links(clusters, pj(imagedir, ic_base_dir, 'clusters'))
-    if vis:
-        if ias is None:
-            ias = co.read_pk(ias_fn)
-        pp.visualize(clusters, ias, max_csize=max_csize)
diff --git a/imagecluster/postproc.py b/imagecluster/postproc.py
index d3a0ce8..f54e754 100644
--- a/imagecluster/postproc.py
+++ b/imagecluster/postproc.py
@@ -1,5 +1,6 @@
 import os
 import shutil
+import functools
 
 from matplotlib import pyplot as plt
 import numpy as np
@@ -9,15 +10,15 @@
 pj = os.path.join
 
 
-def plot_clusters(clusters, ias, max_csize=None, mem_limit=1024**3):
-    """Plot `clusters` of images in `ias`.
+def plot_clusters(clusters, image_arrays, max_csize=None, mem_limit=1024**3):
+    """Plot `clusters` of images in `image_arrays`.
 
     For interactive work, use :func:`visualize` instead.
 
     Parameters
     ----------
     clusters : see :func:`calc.cluster`
-    ias : see :func:`calc.image_arrays`
+    image_arrays : see :func:`calc.image_arrays`
     max_csize : int
         plot clusters with at most this many images
     mem_limit : float or int, bytes
@@ -32,7 +33,7 @@ def plot_clusters(clusters, ias, max_csize=None, mem_limit=1024**3):
     ncols = stats[:,1].sum()
     # csize (number of images per cluster)
     nrows = stats[:,0].max()
-    shape = ias[list(ias.keys())[0]].shape[:2]
+    shape = image_arrays[list(image_arrays.keys())[0]].shape[:2]
     mem = nrows * shape[0] * ncols * shape[1] * 3
     if mem > mem_limit:
         raise Exception(f"size of plot array ({mem/1024**2} MiB) > mem_limit "
@@ -45,7 +46,7 @@ def plot_clusters(clusters, ias, max_csize=None, mem_limit=1024**3):
         for cluster in clusters[csize]:
             icol += 1
             for irow, filename in enumerate(cluster):
-                img_arr = ias[filename]
+                img_arr = image_arrays[filename]
                 arr[irow*shape[0]:(irow+1)*shape[0],
                     icol*shape[1]:(icol+1)*shape[1], :] = img_arr
     print(f"plot array ({arr.dtype}) size: {arr.nbytes/1024**2} MiB")
@@ -56,6 +57,7 @@ def plot_clusters(clusters, ias, max_csize=None, mem_limit=1024**3):
     return fig,ax
 
 
+@functools.wraps(plot_clusters)
 def visualize(*args, **kwds):
     plot_clusters(*args, **kwds)
     plt.show()
diff --git a/imagecluster/tests/tests.py b/imagecluster/tests/tests.py
index 5df42e0..709dff2 100644
--- a/imagecluster/tests/tests.py
+++ b/imagecluster/tests/tests.py
@@ -1,15 +1,17 @@
 import logging
 import os
-import pickle
 import shutil
 import tempfile
+import copy
+import datetime
 
 import numpy as np
 from matplotlib.pyplot import imsave
 import PIL.Image
+import piexif
 
-from imagecluster import main
 from imagecluster import calc as ic
+from imagecluster import io as icio
 
 
 # https://stackoverflow.com/a/39708493
@@ -17,11 +19,19 @@
 
 pj = os.path.join
 
+# TODO re-use ImagedirCtx where possible, we write files in each context,
+# re-use ctxs which don't alter the files
 
 class ImagedirCtx:
-    def __init__(self):
+    def __init__(self, fmt='png'):
+        assert fmt in ['jpg', 'png']
+        date_time_base_dct = dict(year=2019,
+                                  month=12,
+                                  day=31,
+                                  hour=23,
+                                  minute=42)
         imagedir = tempfile.mkdtemp(prefix='imagecluster_')
-        dbfn = pj(imagedir, main.ic_base_dir, 'fingerprints.pk')
+        dbfn = pj(imagedir, icio.ic_base_dir, 'fingerprints.pk')
         arr = np.ones((500,600,3), dtype=np.uint8)
         white = np.ones_like(arr) * 255
         black = np.zeros_like(arr)
@@ -32,19 +42,33 @@ def __init__(self):
                       black=[black]*4)
         image_fns = []
         clusters = {}
+        second = 0
         for color, arrs in images.items():
             nimg = len(arrs)
             clus = clusters.get(nimg, [])
             for idx, arr in enumerate(arrs):
-                fn = pj(imagedir, f'image_{color}_{idx}.png')
-                imsave(fn, arr)
+                if fmt == 'png':
+                    fn = pj(imagedir, f'image_{color}_{idx}.png')
+                    imsave(fn, arr)
+                elif fmt == 'jpg':
+                    fn = pj(imagedir, f'image_{color}_{idx}.jpg')
+                    img = PIL.Image.fromarray(arr, mode='RGB')
+                    # just the DateTime field
+                    date_time_dct = copy.deepcopy(date_time_base_dct)
+                    date_time_dct.update(second=second)
+                    exif_date_time_fmt = '{year}:{month}:{day} {hour}:{minute}:{second}'
+                    exif_date_time_str = exif_date_time_fmt.format(**date_time_dct)
+                    piexif_exif_dct = {'0th': {306: exif_date_time_str}}
+                    img.save(fn, exif=piexif.dump(piexif_exif_dct))
                 image_fns.append(fn)
                 clus.append(fn)
+                second += 1
             clusters[nimg] = [clus]
         self.imagedir = imagedir
         self.dbfn = dbfn
         self.image_fns = image_fns
         self.clusters = clusters
+        self.date_time_base_dct = date_time_base_dct
         print(clusters)
 
     def __enter__(self):
@@ -54,31 +78,37 @@ def __exit__(self, *args):
         shutil.rmtree(self.imagedir)
 
 
-def test_main_basic():
+def test_api_get_image_data():
     with ImagedirCtx() as ctx:
         # run 1: create fingerprints database, run clustering
-        main.main(ctx.imagedir)
-        # run 2: only run clustering, should be much faster, this time also use PCA
-        main.main(ctx.imagedir, pca=True)
-        with open(ctx.dbfn, 'rb') as fd:
-            fps = pickle.load(fd)
-        assert len(fps.keys()) == len(ctx.image_fns)
-        assert set(fps.keys()) == set(ctx.image_fns)
-        for kk,vv in fps.items():
-            assert isinstance(vv, np.ndarray)
-            assert len(vv) == 4096
+        image_arrays,fingerprints,timestamps = icio.get_image_data(ctx.imagedir)
+        # run 2: only run clustering, should be much faster, this time use all
+        # kwds (test API)
+        image_arrays,fingerprints,timestamps = icio.get_image_data(
+            ctx.imagedir,
+            pca_kwds=dict(n_components=0.95),
+            model_kwds=dict(layer='fc2'),
+            img_kwds=dict(size=(224,224)),
+            timestamps_kwds=dict(source='auto'))
+        assert len(fingerprints.keys()) == len(ctx.image_fns)
+        assert set(fingerprints.keys()) == set(ctx.image_fns)
 
 
-def test_cluster():
-    # use API
+def test_low_level_api_and_clustering():
+    # use low level API (same as get_image_data) but call all funcs
     # test clustering
     with ImagedirCtx() as ctx:
-        ias = ic.image_arrays(ctx.imagedir, size=(224,224))
+        image_arrays = icio.read_image_arrays(ctx.imagedir, size=(224,224))
         model = ic.get_model()
-        fps = ic.fingerprints(ias, model)
-        fps = ic.pca(fps, n_components=0.95)
-        clusters = ic.cluster(fps, sim=0.5)
+        fingerprints = ic.fingerprints(image_arrays, model)
+        for kk,vv in fingerprints.items():
+            assert isinstance(vv, np.ndarray)
+            assert len(vv) == 4096, len(vv)
+        fingerprints = ic.pca(fingerprints, n_components=0.95)
+        clusters = ic.cluster(fingerprints, sim=0.5)
         assert set(clusters.keys()) == set(ctx.clusters.keys())
+        assert len(fingerprints.keys()) == len(ctx.image_fns)
+        assert set(fingerprints.keys()) == set(ctx.image_fns)
         for nimg in ctx.clusters.keys():
             for val_clus, ref_clus in zip(clusters[nimg], ctx.clusters[nimg]):
                 msg = f"ref_clus: {ref_clus}, val_clus: {val_clus}"
@@ -91,9 +121,9 @@ def test_png_rgba_io():
         shape2d = (123,456)
         shape = shape2d + (3,)
         rgb = (np.random.rand(*shape) * 255).astype(np.uint8)
-        alpha1 = np.ones(shape2d, dtype=np.uint8) * 255 # white
-        alpha2 = np.zeros(shape2d, dtype=np.uint8) # black
-        alpha3 = (np.random.rand(*shape2d) * 255).astype(np.uint8) # noise
+        alpha1 = np.ones(shape2d, dtype=np.uint8) * 255  # white
+        alpha2 = np.zeros(shape2d, dtype=np.uint8)  # black
+        alpha3 = (np.random.rand(*shape2d) * 255).astype(np.uint8)  # noise
         for alpha in [alpha1, alpha2, alpha3]:
             rgba = np.empty(shape2d + (4,), dtype=np.uint8)
             rgba[..., :3] = rgb
@@ -103,9 +133,29 @@ def test_png_rgba_io():
             img = PIL.Image.open(fn)
             assert img.mode == 'RGBA', img.mode
             assert img.format == 'PNG', img.format
-            rgb2 = np.array(ic.load_img_rgb(fn))
+            rgb2 = np.array(PIL.Image.open(fn).convert('RGB'))
             assert (rgb == rgb2).all()
             assert rgb.dtype == rgb2.dtype
     finally:
         if os.path.exists(fn):
             os.remove(fn)
+
+
+def test_img_timestamp():
+    with ImagedirCtx(fmt='jpg') as ctx:
+        for second, fn in enumerate(ctx.image_fns):
+            stamp = icio.exif_timestamp(fn)
+            dct = copy.deepcopy(ctx.date_time_base_dct)
+            dct.update(second=second)
+            ref = datetime.datetime(**dct, tzinfo=datetime.timezone.utc).timestamp()
+            assert stamp is not None
+            assert stamp == ref, f"stamp={stamp} ref={ref}"
+            # try EXIF first
+            assert stamp == icio.timestamp(fn, source='auto')
+            assert stamp == icio.timestamp(fn, source='exif')
+
+    with ImagedirCtx(fmt='png') as ctx:
+        fn = ctx.image_fns[0]
+        assert icio.stat_timestamp(fn) is not None
+        assert icio.timestamp(fn, source='auto') is not None
+        assert icio.timestamp(fn, source='auto') == icio.stat_timestamp(fn)
diff --git a/requirements.txt b/requirements.txt
index 7f88ea7..6fc2c64 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -4,3 +4,4 @@ keras
 Pillow
 scikit-learn
 matplotlib
+piexif