-
Notifications
You must be signed in to change notification settings - Fork 62
PerformanceTips
LensKit has good performance, but there are some things that make it work better.
Use Conda. LensKit is optimized for Anaconda-based Python installations, and works best in that environment. It runs in vanilla Python installations, but takes advantage of Anaconda (and in particularly the MKL) to obtain exceptionally good performance.
LensKit works best when both LensKit and the MKL use Intel's Thread Building Blocks. If you have TBB installed, LensKit will use it automatically:
conda install tbb
MKL, however, defaults to OpenMP, even when TBB is installed. Therefore, you also need to set MKL_THREADING_LAYER=tbb
for good performance:
export MKL_THREADING_LAYER=tbb
Put that in whatever shell scripts you use to launch LensKit. You can also set it in Python, put this at the top of your script before you import any LensKit or PyData packages:
import os
os.environ['MKL_THREADING_LAYER'] = 'tbb'
LensKit uses Numba for multithreaded training, so the NUMBA_NUM_THREADS
environment variable controls its parallelism. Unfortunately, JobLib does not set this variable.
Other multiprocessing is done by JobLib, and is controlled by n_jobs
parameters to batch evaluation methods.
LensKit also uses MKL, so MKL_NUM_THREADS
controls the internal MKL parallelism.
When both LensKit and MKL are using TBB, you don't have to worry about LensKit's threads and MKL's threads competing with each other - they'll run in the same thread pool. You do need to worry about JobLib-level parallelism combining with internal parallelism to oversubscribe the CPU. If you are using JobLib to train algorithms in parallel, make sure that NUMBA_NUM_THREADS
times the Joblib job count is no more than your CPU count.