Skip to content

gohilankit/PageRankCluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PageRankCluster

This application would run on an HDFS cluster and output a list of webpages ranked in order of their calculated pageranks. The crawled data would be used from common crawl repository on AWS. The project uses Apache Spark's GraphX API. The sample input files are taken from hyperlink graph provided by Web Data Commons at http://webdatacommons.org/hyperlinkgraph/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published