Skip to content

PerformanceTips

Michael Ekstrand edited this page Apr 30, 2019 · 2 revisions

LensKit has good performance, but there are some things that make it work better.

Conda

Use Conda. LensKit is optimized for Anaconda-based Python installations, and works best in that environment. It runs in vanilla Python installations, but takes advantage of Anaconda (and in particularly the MKL) to obtain exceptionally good performance.

Use TBB

LensKit works best when both LensKit and the MKL use Intel's Thread Building Blocks. If you have TBB installed, LensKit will use it automatically:

conda install tbb

MKL, however, defaults to OpenMP, even when TBB is installed. Therefore, you also need to set MKL_THREADING_LAYER=tbb for good performance:

export MKL_THREADING_LAYER=tbb

Put that in whatever shell scripts you use to launch LensKit. You can also set it in Python, put this at the top of your script before you import any LensKit or PyData packages:

import os
os.environ['MKL_THREADING_LAYER'] = 'tbb'

Control LensKit's thread count

LensKit uses Numba for multithreaded training, so the NUMBA_NUM_THREADS environment variable controls its parallelism. Unfortunately, JobLib does not set this variable.

Other multiprocessing is done by JobLib, and is controlled by n_jobs parameters to batch evaluation methods.

LensKit also uses MKL, so MKL_NUM_THREADS controls the internal MKL parallelism.

When both LensKit and MKL are using TBB, you don't have to worry about LensKit's threads and MKL's threads competing with each other - they'll run in the same thread pool. You do need to worry about JobLib-level parallelism combining with internal parallelism to oversubscribe the CPU. If you are using JobLib to train algorithms in parallel, make sure that NUMBA_NUM_THREADS times the Joblib job count is no more than your CPU count.

Clone this wiki locally