Skip to content

hyunwoona/twitter-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

#Top Tweets, Hashtags, and Sentiments using Twitter Streaming API and Spark

##Main Projects Two main programs, TopTweetAndHashtagCollector and TwitterSentimentAnalyzer. Using TwitterSentimentAnalyzer, I also made a prototype of an interesting web app, Tweet Stats.

###TopTweetAndHashtagCollector From a stream of tweets, gets top hashtags and tweets

###TwitterSentimentAnalyzer From a stream of tweets, calculates the sentiment score of each tweet, and gets the sentiment score and geo-location of the author.

This sentiment score and geo-location is further processed on the client side, to look up the address by the latitude-longitude and get the ISO state code.

This is done by get_sentiment_by_state.py, written in Python3.

###Tweet Stats Tweet Stats is web-app hosted on a Python Django server.

It currently has a Geo-chart of average sentiment score of people in different states. I explained about the project in detail on the website.

##Running the Programs

###TopTweetAndHashtagCollector.java and TwitterSentimentAnalyzer.java

Takes two command-line arguments. Path to Twitter credential file: full path to a textfile that contains a twitter login. See twitter4j.properties.template.

If you need these keys, please refer to How to get API Keys and Tokens for Twitter.

Output file path: full path to a textfile to write sentiment analysis output to

Dependencies are contained in pom.xml

Some sample output files from TopTweetAndHashtagCollector: Top Tweet and Hashtags 1, Top Tweet and Hashtags 2, Top Tweet and Hashtags 3.

Some sample output files from TwitterSentimentAnalyzer: 200_lines, 3000_lines.

###get_sentiment_by_state.py

Takes one command-line argument.

Path to input file: lines of comma-separated scores, latitudes, and longitudes. The input to this file is produced from TwitterSentimentAnalyzer.

You can use 200_lines or 3000_lines.

Output with these sample inputs: output_200_lines or output_3000_lines.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published