Skip to content

Distributed Data Analysis using Spark, Map Reduce and HDFS

Notifications You must be signed in to change notification settings

Srihari123456/Page-Rank-with-Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Page Rank : Distributed Data Analysis using Spark

This work is done as a part of the (CS 744) Advanced Big Data Systems Course at the University of Wisconsin Madison.

Parameters Considered for Performance Analysis:

  1. Impact of Worker Failure
  2. Effect of Data Persistence
  3. Correlation between partitions and job completion time.

Datasets used for Page Rank:

  1. Berkeley-Stanford web graph
  2. Wiki Articles

Setup

  • Configure Hadoop and Spark as detailed in [Link to Assignment] and place it in /mnt/data/

  • Execute each task using

./run.sh

About

Distributed Data Analysis using Spark, Map Reduce and HDFS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published