Skip to content

An example demonstrating matrix multiplication using Apache Spark in Python, created for CAB401 High Performance and Parallel Computing at the Queensland University of Technology, Semester 2, 2021.

License

Notifications You must be signed in to change notification settings

jamestkelly/spark-matrix-multiply

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Matrix Multiplication: A Short Dive into Parallellism in Apache Spark

Table of Contents
  1. About The Project
  2. Getting Started
  3. License
  4. Contact
  5. Acknowledgements

About The Project

An example demonstrating matrix multiplication using Apache Spark in Python, created for the purpooses of an oral presentation in CAB401 High Performance and Parallel Computing at the Queensland University of Technology (QUT), Semester 2, 2021. This project is a simple Python script that compares the performance of Apache Spark, i.e. pyspark against standard matrix multiplication implementation.

Built With

This script relies on the following dependencies.

Getting Started

Prerequisites

This project has been built and run targetting a Unix/Linux environment. The steps used to instantiate and setup the environment for the use of this script may vary depending on operating system. For the purposes of this it is assumed that you have homebrew and python3 installed.

Installation

  1. Install Java to the local machine, brew install java
  2. Install Scala to the local machine, brew install scala
  3. Install Apache Spark, brew install spark
  4. Install scipy, pyspark, and findspark in Python, i.e. pip3 install pyspark
  5. Clone the repo
    git clone https://github.com/jamestkelly/spark-matrix-multiplication.git
  6. Run the script from the command-line with python3 path/to/file.py

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Jim Tran Kelly

Project Link: https://github.com/jamestkelly/titanicSpark

Acknowledgements

About

An example demonstrating matrix multiplication using Apache Spark in Python, created for CAB401 High Performance and Parallel Computing at the Queensland University of Technology, Semester 2, 2021.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages