Skip to content

Latest commit

 

History

History
114 lines (81 loc) · 7.28 KB

sentiment_analysis.md

File metadata and controls

114 lines (81 loc) · 7.28 KB

Sentiment analysis

Sentiment analysis is the task of classifying the polarity of a given text.

IMDb

The IMDb dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. Models are evaluated based on accuracy.

Model Score Paper / Source
ULMFiT (Howard and Ruder, 2018) 95.4 Universal Language Model Fine-tuning for Text Classification
Block-sparse LSTM (Gray et al., 2017) 94.99 GPU Kernels for Block-Sparse Weights
oh-LSTM (Johnson and Zhang, 2016) 94.1 Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Virtual adversarial training (Miyato et al., 2016) 94.1 Adversarial Training Methods for Semi-Supervised Text Classification
BCN+Char+CoVe (McCann et al., 2017) 91.8 Learned in Translation: Contextualized Word Vectors

SST

The Stanford Sentiment Treebank contains of 215,154 phrases with fine-grained sentiment labels in the parse trees of 11,855 sentences in movie reviews. Models are evaluated either on fine-grained (five-way) or binary classification based on accuracy.

Fine-grained classification:

Model Accuracy Paper / Source
BCN+ELMo (Peters et al., 2018) 54.7 Deep contextualized word representations
BCN+Char+CoVe (McCann et al., 2017) 53.7 Learned in Translation: Contextualized Word Vectors

Binary classification:

Model Accuracy Paper / Source
Block-sparse LSTM (Gray et al., 2017) 93.2 GPU Kernels for Block-Sparse Weights
bmLSTM (Radford et al., 2017) 91.8 Learning to Generate Reviews and Discovering Sentiment
BCN+Char+CoVe (McCann et al., 2017) 90.3 Learned in Translation: Contextualized Word Vectors
Neural Semantic Encoder (Munkhdalai and Yu, 2017) 89.7 Neural Semantic Encoders
BLSTM-2DCNN (Zhou et al., 2017) 89.5 Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling

Yelp

The Yelp Review dataset consists of more than 500,000 Yelp reviews. There is both a binary and a fine-grained (five-class) version of the dataset. Models are evaluated based on error (1 - accuracy; lower is better).

Fine-grained classification:

Model Error Paper / Source
ULMFiT (Howard and Ruder, 2018) 29.98 Universal Language Model Fine-tuning for Text Classification
DPCNN (Johnson and Zhang, 2017) 30.58 Deep Pyramid Convolutional Neural Networks for Text Categorization
CNN (Johnson and Zhang, 2016) 32.39 Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Char-level CNN (Zhang et al., 2015) 37.95 Character-level Convolutional Networks for Text Classification

Binary classification:

Model Error Paper / Source
ULMFiT (Howard and Ruder, 2018) 2.16 Universal Language Model Fine-tuning for Text Classification
DPCNN (Johnson and Zhang, 2017) 2.64 Deep Pyramid Convolutional Neural Networks for Text Categorization
CNN (Johnson and Zhang, 2016) 2.90 Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Char-level CNN (Zhang et al., 2015) 4.88 Character-level Convolutional Networks for Text Classification

SemEval

SemEval (International Workshop on Semantic Evaluation) has a specific task for Sentiment analysis. Latest year overview of such task (Task 4) can be reached at: http://www.aclweb.org/anthology/S17-2088

SemEval-2017 Task 4 consists of five subtasks, each offered for both Arabic and English:

  1. Subtask A: Given a tweet, decide whether it expresses POSITIVE, NEGATIVE or NEUTRAL sentiment.

  2. Subtask B: Given a tweet and a topic, classify the sentiment conveyed towards that topic on a two-point scale: POSITIVE vs. NEGATIVE.

  3. Subtask C: Given a tweet and a topic, classify the sentiment conveyed in the tweet towards that topic on a five-point scale: STRONGLYPOSITIVE, WEAKLYPOSITIVE, NEUTRAL, WEAKLYNEGATIVE, and STRONGLYNEGATIVE.

  4. Subtask D: Given a set of tweets about a topic, estimate the distribution of tweets across the POSITIVE and NEGATIVE classes.

  5. Subtask E: Given a set of tweets about a topic, estimate the distribution of tweets across the five classes: STRONGLYPOSITIVE, WEAKLYPOSITIVE, NEUTRAL, WEAKLYNEGATIVE, and STRONGLYNEGATIVE.

Subtask A results:

Model F1-score Paper / Source
LSTMs+CNNs ensemble with multiple conv. ops (Cliche. 2017) 0.685 BB twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs
Deep Bi-LSTM+attention (Baziotis et al., 2017) 0.677 DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis

Aspect-based sentiment analysis

Sentihood

Sentihood is a dataset for targeted aspect-based sentiment analysis (TABSA), which aims to identify fine-grained polarity towards a specific aspect. The dataset consists of 5,215 sentences, 3,862 of which contain a single target, and the remainder multiple targets. F1 is used as evaluation metric for aspect detection and accuracy as evaluation metric for sentiment analysis.

Model Aspect Sentiment Paper / Source
Liu et al. (2018) 78.5 91.0 Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-based Sentiment Analysis
SenticLSTM (Ma et al., 2018) 78.2 89.3 Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM
LSTM-LOC (Saeidi et al., 2016) 69.3 81.9 Sentihood: Targeted aspect based sentiment analysis dataset for urban neighbourhoods

Go back to the README