cl-mpi provides convenient CFFI bindings for the Message Passing Interface (MPI). MPI is typically used in High Performance Computing to utilize big parallel computers with thousands of cores. It features minimal communication overhead with a latency in the range of microseconds. In comparison to the C or FORTRAN interface of MPI, cl-mpi relieves the programmer from working with raw pointers to memory and a plethora of mandatory function arguments.
If you have questions or suggestions, feel free to contact me ([email protected]).
cl-mpi has been tested with MPICH, MPICH2, IntelMPI and Open MPI.
An MPI program must be launched with mpirun
or mpiexec
. These commands
spawn multiple processes depending on your system and commandline
parameters. Each process is identical, except that it has a unique rank
that can be queried with (MPI-COMM-RANK)
. The ranks are assigned from 0
to (- (MPI-COMM-SIZE) 1)
. A wide range of communication functions is
available to transmit messages between different ranks. To become familiar
with cl-mpi, see the examples directory.
The easiest way to deploy and run cl-mpi applications is by creating a statically linked binary. To do so, create a separate ASDF system like this:
(defsystem :my-mpi-app
:depends-on (:cl-mpi)
:defsystem-depends-on (:cl-mpi-asdf-integration)
:class :mpi-program
:build-operation :static-program-op
:build-pathname "my-mpi-app"
:entry-point "my-mpi-app:main"
:serial t
:components
((:file "foo") (:file "bar")))
and simply run
(asdf:make :my-mpi-app)
on the REPL. Note that not all Lisp implementation support the creation of statically linked binaries (actually, we only tested SBCL so far). Alternatively, you can try to use uiop:dump-image to create binaries.
Further remark: If the creation of statically linked binaries with SBCL
fails with something like “undefined reference to main”, your SBCL is
probably not built with the :sb-linkable-runtime
feature. You are
affected by this when (find :sb-linkable-runtime *features*)
returns
NIL. In that case, you have to compile SBCL yourself, which is as simple as
executing the following commands, where SOMEWHERE is the desired
installation folder
git clone git://git.code.sf.net/p/sbcl/sbcl
cd sbcl
sh make.sh --prefix=SOMEWHERE --fancy --with-sb-linkable-runtime --with-sb-dynamic-core
cd tests && sh run-tests.sh
sh install.sh
To run the test suite:
./scripts/run-test-suite.sh all
or
./scripts/run-test-suite.sh YOUR-FAVOURITE-LISP
cl-mpi makes no additional copies of transmitted data and has therefore the same bandwidth as any other language (C, FORTRAN). However the convenience of error handling, automatic inference of the message types and safe computation of memory locations adds a little overhead to each message. The exact overhead varies depending on the Lisp implementation and platform but is somewhere around 1000 machine cycles.
Summary:
- latency increase per message: 400 nanoseconds (SBCL on a 2.4GHz Intel i7-5500U)
- bandwidth unchanged
- Alex Fukunaga
- Marco Heisig
This project was funded by KONWIHR (The Bavarian Competence Network for Technical and Scientific High Performance Computing) and the Chair for Applied Mathematics 3 of Prof. Dr. Bänsch at the FAU Erlangen-Nürnberg.
Big thanks to Nicolas Neuss for all the useful suggestions.