Implementation of the Meta-Iterative Map-Reduce algorithm to perform distributed & scalable training of a machine learning model on a GPU+CPU cluster using CUDA-aware-MPI.
Download the Project Report here .
Let's explain this using a bottom-up approach:
-
Map-Reduce is a programming model to parallelize computations of large tasks by parallelly solving sub-tasks. The sub-tasks are mapped to multiple 'workers' that concurrently solve their parts and the outputs of the sub-tasks are reduced back to form a solution to the primary task.
-
For tasks that benefit from iterations of computation, such as Machine Learning model training, the Map-Reduce operations are performed iteratively until a required solution is obtained. This programming model is therefore termed Iterative Map Reduce.
-
Now imagine, instead of diving the task into 1 level of sub-tasks, we continue to divide the sub-tasks further, into their own sub-(sub)-tasks that themselves follow a Map-reduce paradigm. This would mean that each 'worker' that was originally computing a sub-task, is now itself delegating work to a secondary level of workers. Done iteratively, we termed this composite of Map-reduce operations as Meta Iterative MapReduce.
-
Effective speed-up of
No. of Parallel MPI Processes ∗ No. of CUDA Kernel Threads
-
Using the Meta model, we can effectively leverage CUDA-aware-MPI. Thus,
-
All operations that are required to carry out a message transfer i.e. a send operation can be pipelined.
-
Acceleration technologies like GPUDirect can be utilized by the MPI library transparently to the user.
-
-
Iterative MapReduce has significant applications for massively parallel, complex computations that are iteratively performed, such as modern Deep Learning applications, wherein both strict data store and floating-point operations requirements exist.
-
CUDA Toolkit (ver >= 7.0)
-
Microsoft MPI or OpenMPI (tested on Microsoft MPI ver 8.0)
-
Nvidia Graphics card [CUDA-supported GPU]
-
Clone the repository :
git clone https://github.com/soilad/Meta-Iterative-MapReduce.git
-
Ensure that the
cuda.h
header file is added to the compilation path in your IDE or mpicc compiler. -
Compile the kernel.cu file using the MPI compiler.
-
For Microsoft MPI,
```sh mpicc kernel.cu -o metamap ```
-
For Open MPI,
Refer to https://www.open-mpi.org/faq/?category=runcuda & https://www.open-mpi.org/faq/?category=buildcuda
-
-
Execute the compiled kernel code:
$ ./metamap
- CUDA-aware MPI: Accelerate MPI by leveraging GPU compute through CUDA. https://devblogs.nvidia.com/introduction-cuda-aware-mpi/
- Iterative MapReduce : The Map-reduce paradigm was adapted for iterative operations, for example in Machine Learning model training. https://deeplearning4j.org/iterativereduce
- Meta Iterative MapReduce : We (the authors) proposed a model that performs two "levels" of iterative map-reduce operations. The gist is that each map-operation in the first level of map-reduce is a composite of another level of map-reduce operation. < Efficiency bounds are better this way >
- [Linear] Regression : To showcase the improvement in model training speed, we perform distributed training of a Linear regression using the Meta-Iterative Map-Reduce programming model.