Sentiment analysis

Sentiment analysis is the task of classifying the polarity of a given text.

IMDb

The IMDb dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. Models are evaluated based on accuracy.

Model	Score	Paper / Source
ULMFiT (Howard and Ruder, 2018)	95.4	Universal Language Model Fine-tuning for Text Classification
Block-sparse LSTM (Gray et al., 2017)	94.99	GPU Kernels for Block-Sparse Weights
oh-LSTM (Johnson and Zhang, 2016)	94.1	Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Virtual adversarial training (Miyato et al., 2016)	94.1	Adversarial Training Methods for Semi-Supervised Text Classification
BCN+Char+CoVe (McCann et al., 2017)	91.8	Learned in Translation: Contextualized Word Vectors

SST

The Stanford Sentiment Treebank contains of 215,154 phrases with fine-grained sentiment labels in the parse trees of 11,855 sentences in movie reviews. Models are evaluated either on fine-grained (five-way) or binary classification based on accuracy.

Fine-grained classification:

Model	Accuracy	Paper / Source
BCN+ELMo (Peters et al., 2018)	54.7	Deep contextualized word representations
BCN+Char+CoVe (McCann et al., 2017)	53.7	Learned in Translation: Contextualized Word Vectors

Binary classification:

Model	Accuracy	Paper / Source
Block-sparse LSTM (Gray et al., 2017)	93.2	GPU Kernels for Block-Sparse Weights
bmLSTM (Radford et al., 2017)	91.8	Learning to Generate Reviews and Discovering Sentiment
BCN+Char+CoVe (McCann et al., 2017)	90.3	Learned in Translation: Contextualized Word Vectors
Neural Semantic Encoder (Munkhdalai and Yu, 2017)	89.7	Neural Semantic Encoders
BLSTM-2DCNN (Zhou et al., 2017)	89.5	Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling

Yelp

The Yelp Review dataset consists of more than 500,000 Yelp reviews. There is both a binary and a fine-grained (five-class) version of the dataset. Models are evaluated based on error (1 - accuracy; lower is better).

Fine-grained classification:

Model	Error	Paper / Source
ULMFiT (Howard and Ruder, 2018)	29.98	Universal Language Model Fine-tuning for Text Classification
DPCNN (Johnson and Zhang, 2017)	30.58	Deep Pyramid Convolutional Neural Networks for Text Categorization
CNN (Johnson and Zhang, 2016)	32.39	Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Char-level CNN (Zhang et al., 2015)	37.95	Character-level Convolutional Networks for Text Classification

Binary classification:

Model	Error	Paper / Source
ULMFiT (Howard and Ruder, 2018)	2.16	Universal Language Model Fine-tuning for Text Classification
DPCNN (Johnson and Zhang, 2017)	2.64	Deep Pyramid Convolutional Neural Networks for Text Categorization
CNN (Johnson and Zhang, 2016)	2.90	Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Char-level CNN (Zhang et al., 2015)	4.88	Character-level Convolutional Networks for Text Classification

SemEval

SemEval (International Workshop on Semantic Evaluation) has a specific task for Sentiment analysis. Latest year overview of such task (Task 4) can be reached at: http://www.aclweb.org/anthology/S17-2088

SemEval-2017 Task 4 consists of five subtasks, each offered for both Arabic and English:

Subtask A: Given a tweet, decide whether it expresses POSITIVE, NEGATIVE or NEUTRAL sentiment.
Subtask B: Given a tweet and a topic, classify the sentiment conveyed towards that topic on a two-point scale: POSITIVE vs. NEGATIVE.
Subtask C: Given a tweet and a topic, classify the sentiment conveyed in the tweet towards that topic on a five-point scale: STRONGLYPOSITIVE, WEAKLYPOSITIVE, NEUTRAL, WEAKLYNEGATIVE, and STRONGLYNEGATIVE.
Subtask D: Given a set of tweets about a topic, estimate the distribution of tweets across the POSITIVE and NEGATIVE classes.
Subtask E: Given a set of tweets about a topic, estimate the distribution of tweets across the five classes: STRONGLYPOSITIVE, WEAKLYPOSITIVE, NEUTRAL, WEAKLYNEGATIVE, and STRONGLYNEGATIVE.

Subtask A results:

Model	F1-score	Paper / Source
LSTMs+CNNs ensemble with multiple conv. ops (Cliche. 2017)	0.685	BB twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs
Deep Bi-LSTM+attention (Baziotis et al., 2017)	0.677	DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis

Aspect-based sentiment analysis

Sentihood

Sentihood is a dataset for targeted aspect-based sentiment analysis (TABSA), which aims to identify fine-grained polarity towards a specific aspect. The dataset consists of 5,215 sentences, 3,862 of which contain a single target, and the remainder multiple targets. F1 is used as evaluation metric for aspect detection and accuracy as evaluation metric for sentiment analysis.

Model	Aspect	Sentiment	Paper / Source
Liu et al. (2018)	78.5	91.0	Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-based Sentiment Analysis
SenticLSTM (Ma et al., 2018)	78.2	89.3	Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM
LSTM-LOC (Saeidi et al., 2016)	69.3	81.9	Sentihood: Targeted aspect based sentiment analysis dataset for urban neighbourhoods

Go back to the README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sentiment_analysis.md

sentiment_analysis.md

Sentiment analysis

IMDb

SST

Yelp

SemEval

Aspect-based sentiment analysis

Sentihood

Files

sentiment_analysis.md

Latest commit

History

sentiment_analysis.md

File metadata and controls

Sentiment analysis

IMDb

SST

Yelp

SemEval

Aspect-based sentiment analysis

Sentihood