Clustering BERT Eembedding via Dot Product (CBERTdp)

In this study, we explore strategies to reduce the computational complexity of sentiment analysis methods, which commonly rely on resource-intensive neural networks. Our investigation focuses on leveraging BERT-extracted embeddings and clustering techniques to streamline the sentiment classification process. Specifically, we propose a novel approach where we cluster BERT embeddings and classify sentiment by computing the dot product between a new sentence's embedding and cluster centroids. We present three variants of this approach, each offering a different trade-off between computational efficiency and accuracy.

Our findings reveal that only the variant incorporating an attention layer achieves satisfactory results in terms of sentiment classification accuracy. This approach demonstrates moderate computational costs compared to other baselines.

Overall, our study sheds light on promising avenues for reducing the computational overhead of sentiment analysis, highlighting the potential of leveraging clustering techniques and attention mechanisms for more efficient and effective sentiment classification.

Used Dataseets

Enviroment Setup

conda create --name <env> --file requirements.txt

Note before running the application

We would like to inform the user that the obtaining of the base embedding from all of three datasets woould require much time since its dimension expecially for yelp_polarity.

How to Run

usage: main.py [-h] -s {our_approaches,competitors,baselines}
               [{our_approaches,competitors,baselines} ...] -a ABLATIONS -m
               {BERT,DISTILBERT}

optional arguments:
  -h, --help            show this help message and exit
  -s {our_approaches,competitors,baselines} [{our_approaches,competitors,baselines} ...], --strategies {our_approaches,competitors,baselines} [{our_approaches,competitors,baselines} ...]
                        Possible strategies to run
  -a ABLATIONS, --ablations ABLATIONS
                        Bool ablations
  -m {BERT,DISTILBERT}, --model {BERT,DISTILBERT}
                        Pretreined BERT model from Huggingface

Results

We have uploaded the .csv file containig our results for every strategies including baselines, competitors, ablations and of course our approacches, you can see them following this link.

Documentations

Proposal: here we give the initial idea of what we wanted to design and how to evaluate it
Poster: this is the poster that we deliver for a class posters session
Paper Report: the actual and final paper report of the project
Presentation: slides for presenting the work

Cite Us

@online{CBERTdp,
    author = "Thomas Vecchiato, Riccardo Zuliani, Alice Schirrmeister, Isabel Marie Ritter",
    title = "Clustering BERT Eembedding via Dot Product (CBERTdp)",
    url  = "https://github.com/zuliani99/CBERTdp/blob/main/Project_Report_NLP.pdf",
}

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
app		app
documentations		documentations
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clustering BERT Eembedding via Dot Product (CBERTdp)

Used Dataseets

Enviroment Setup

Note before running the application

How to Run

Results

Documentations

Cite Us

About

Releases 2

Packages

Contributors 2

Languages

License

zuliani99/CBERTdp

Folders and files

Latest commit

History

Repository files navigation

Clustering BERT Eembedding via Dot Product (CBERTdp)

Used Dataseets

Enviroment Setup

Note before running the application

How to Run

Results

Documentations

Cite Us

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages