Reddit-Sentiment-Analysis

2-part Mapreduce Program that performs textual analysis of Reddit data (approx. 300 GB of JSON data) preprocessed by another team member. This program performs textual sentiment analysis on reddit comments determined by preprocessing to be discussing either Donald Trump, Hillary Clinton, or both, and summarizes the data.

The preprocessing is assumed to have already screened comments by date and topic (Trump and Clinton). Per the specifications of the project, we limited our scope to comments made between July 19th, 2016, through November 8th, 2016.

PART 1:

TO COMPILE PROGRAM:

$ mkdir build

$ $HADOOP_HOME/bin/hadoop com.sun.tools.javac.Main *.java -d build -Xlint

$ jar -cvf SentimentAnalysis.jar -C build/ .

$ rm -r build

TO RUN:

This assumes you have all text files (ExampleInput.txt, negate-words.txt, pos-words.txt, and neg-words.txt) in /sentimentAnalysis directory in hdfs. Modify the paths to reflect any differences.

$HADOOP_HOME/bin/hadoop jar SentimentAnalysis.jar org.SentimentAnalysis.Driver /sentimentAnalysis/ExampleInput.txt /sentimentAnalysis/out -negation /sentimentAnalysis/negate-words.txt -pos /sentimentAnalysis/pos-words.txt -neg /sentimentAnalysis/neg-words.txt

As-is, it will take /sentimentAnalysis/ExampleInput.txt, run the program, and store the results in /sentimentAnalysis/out. This can be modified to a directory of input files by replacing sentimentAnalysis/ExampleInput.txt with /your-HDFS-Directory/

Part 2:

Part 2 takes the output from part 1, and summarizes the data. It is hardcoded to utilize the partitions defined in details.md, but could be altered easily to read partition data from a file, etc.

TO COMPILE PROGRAM:

$ mkdir build

$ $HADOOP_HOME/bin/hadoop com.sun.tools.javac.Main *.java -d build -Xlint

$ jar -cvf SentimentAnalysis.jar -C build/ .

$ rm -r build

TO RUN:

$ $HADOOP_HOME/bin/hadoop jar Summary.jar org.Summary.Driver /SentimentAnalysis/out /SentimentAnalysis/summary

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Dictionaries		Dictionaries
Example		Example
Part_1		Part_1
Part_2		Part_2
README.md		README.md
details.md		details.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reddit-Sentiment-Analysis

PART 1:

TO COMPILE PROGRAM:

TO RUN:

Part 2:

Part 2 takes the output from part 1, and summarizes the data. It is hardcoded to utilize the partitions defined in details.md, but could be altered easily to read partition data from a file, etc.

TO COMPILE PROGRAM:

TO RUN:

About

Releases

Packages

Languages

dboston1/Reddit-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Reddit-Sentiment-Analysis

PART 1:

TO COMPILE PROGRAM:

TO RUN:

Part 2:

Part 2 takes the output from part 1, and summarizes the data. It is hardcoded to utilize the partitions defined in details.md, but could be altered easily to read partition data from a file, etc.

TO COMPILE PROGRAM:

TO RUN:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages