Srihari123456 / Page-Rank-with-Spark Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Distributed Data Analysis using Spark, Map Reduce and HDFS

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
PageRank		PageRank
SparkApplication		SparkApplication
README.md		README.md

Repository files navigation

Page Rank : Distributed Data Analysis using Spark

This work is done as a part of the (CS 744) Advanced Big Data Systems Course at the University of Wisconsin Madison.

Parameters Considered for Performance Analysis:

Impact of Worker Failure
Effect of Data Persistence
Correlation between partitions and job completion time.

Datasets used for Page Rank:

Berkeley-Stanford web graph
Wiki Articles

Setup

Configure Hadoop and Spark as detailed in [Link to Assignment] and place it in /mnt/data/
Execute each task using

./run.sh

About

Distributed Data Analysis using Spark, Map Reduce and HDFS

Report repository

Releases

No releases published

Packages

No packages published

Languages