#Top Tweets, Hashtags, and Sentiments using Twitter Streaming API and Spark
##Main Projects Two main programs, TopTweetAndHashtagCollector and TwitterSentimentAnalyzer. Using TwitterSentimentAnalyzer, I also made a prototype of an interesting web app, Tweet Stats.
###TopTweetAndHashtagCollector From a stream of tweets, gets top hashtags and tweets
###TwitterSentimentAnalyzer From a stream of tweets, calculates the sentiment score of each tweet, and gets the sentiment score and geo-location of the author.
This sentiment score and geo-location is further processed on the client side, to look up the address by the latitude-longitude and get the ISO state code.
This is done by get_sentiment_by_state.py, written in Python3.
###Tweet Stats Tweet Stats is web-app hosted on a Python Django server.
It currently has a Geo-chart of average sentiment score of people in different states. I explained about the project in detail on the website.
##Running the Programs
###TopTweetAndHashtagCollector.java and TwitterSentimentAnalyzer.java
Takes two command-line arguments.
Path to Twitter credential file
: full path to a textfile that contains a twitter login. See twitter4j.properties.template
.
If you need these keys, please refer to How to get API Keys and Tokens for Twitter.
Output file path
: full path to a textfile to write sentiment analysis output to
Dependencies are contained in pom.xml
Some sample output files from TopTweetAndHashtagCollector
: Top Tweet and Hashtags 1, Top Tweet and Hashtags 2, Top Tweet and Hashtags 3.
Some sample output files from TwitterSentimentAnalyzer
: 200_lines, 3000_lines.
###get_sentiment_by_state.py
Takes one command-line argument.
Path to input file
: lines of comma-separated scores, latitudes, and longitudes. The input to this file is produced from TwitterSentimentAnalyzer
.
You can use 200_lines or 3000_lines.
Output with these sample inputs: output_200_lines or output_3000_lines.