Table of Contents
An example demonstrating matrix multiplication using Apache Spark in Python, created for the purpooses of an oral presentation in CAB401 High Performance and Parallel Computing at the Queensland University of Technology (QUT), Semester 2, 2021. This project is a simple Python script that compares the performance of Apache Spark, i.e. pyspark
against standard matrix multiplication implementation.
This script relies on the following dependencies.
This project has been built and run targetting a Unix/Linux environment. The steps used to instantiate and setup the environment for the use of this script may vary depending on operating system. For the purposes of this it is assumed that you have homebrew
and python3
installed.
- Install Java to the local machine,
brew install java
- Install Scala to the local machine,
brew install scala
- Install Apache Spark,
brew install spark
- Install
scipy
,pyspark
, andfindspark
in Python, i.e.pip3 install pyspark
- Clone the repo
git clone https://github.com/jamestkelly/spark-matrix-multiplication.git
- Run the script from the command-line with
python3 path/to/file.py
Distributed under the MIT License. See LICENSE
for more information.
Jim Tran Kelly
- Student Number: N9763686
- General Email: [email protected]
- University Contact: [email protected]
Project Link: https://github.com/jamestkelly/titanicSpark