tweet-sentiment-pyspark

Tweets Sentiment Classification Using PySpark's NaiveBayes. Sentiment Analysis on tweets Dataset using NaiveBayes binary classification Model and Bag of words technique to make feature vectors to feed NaiveBayes.
tweets are classified as positive=1, negative=0.
Dataset contains contains 1,578,627 classified tweets, each row is marked as 1 for positive sentiment and 0 for negative sentiment. Dataset can be downloaded from the this Tweets Dataset .
With this script i acheived upto 60% Accuracy on unlabeled test dataset.
I took about 20 min. max on my i5-2.3u, 4gb ram machine to train on 90% of the dataset and test on remaining 10%.

Dependencies

Apache Spark and pyspark
Pandas
Python 2.7

TODO
One can further improve accuracy by Lemmatisation of dataset and using word2vec technique. On which i am still working on. And you can also try different classification models like Random Forest, SVM or Even try Deep Learning, CNN, RNN.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
tweet_sentiment_analysis.py		tweet_sentiment_analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tweet-sentiment-pyspark

About

Releases

Packages

Languages

License

sohaibomr/tweet-sentiment-pyspark

Folders and files

Latest commit

History

Repository files navigation

tweet-sentiment-pyspark

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages