Exploring the Impact of the Attention Mechanism in Arabic Text Classification with AraBert

About

This project investigates how the Attention mechanism improves Arabic text classification. It compares deep learning models, LSTM and CNN, that lack Attention, with AraBert, a Transformer model that utilizes Attention.

Research Report

You can find the full research report on this project here.

Dataset

Datasets used in the project can be found here:

CNN Arabic news dataset: cnn-arabic-utf8 (primary dataset)
BBC Arabic news dataset: bbc-arabic-utf8

Models

The following deep learning models were tested:

LSTM Model: A Long short-term memory model for text classification.
CNN Model: A convolutional neural network model for text classification.
AraBERT Model: A transformer-based model specifically designed for Arabic text.

Notebooks

processing-files.ipynb: Create .csv dataset from the downloaded dataset.
preprocessing.ipynb: Exploratory data analysis (EDA) and preprocessing steps, performed on the primary dataset CNN.
data_augmentation.ipynb: Data augmentation techniques performed.
arabic_text_classification_LSTM_CNN.ipynb: CNN and LSTM models implementation for Arabic text classification.
text_classification_with_AraBert.ipynb: AraBERT model implementation for Arabic text classification.
arabic-sentiment-analysis-lstm-cnn (1).ipynb: Arabic tweets sentiment analysis using LSTM and CNN.
AraBERT_text_classification_and_sentiment_analysis.ipynb: AraBERT model implementation for Arabic text classification + Arabic tweets sentiment analysis.

Files

dataset.csv: The .csv dataset created from original .txt files.
bbc-arabic-utf8_folder_form.rar: Primary dataset: BBC Arabic news in a folder structure. Each folder corresponds to a specific category and contains .txt files related to that category.
cnn-arabic-utf8_csv_form.zip: CNN dataset in .csv format.
train_data.csv: Dataset after performing Data Augmentation.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

The datasets used were sourced from Arabic Corpora on SourceForge.

Contact

For any inquiries or contributions, please contact [[email protected]].

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
Data Augmentation		Data Augmentation
Data_augmentation.ipynb		Data_augmentation.ipynb
README.md		README.md
arabic_text_classification_LSTM_CNN.ipynb		arabic_text_classification_LSTM_CNN.ipynb
preprocessing.ipynb		preprocessing.ipynb
processing-files.ipynb		processing-files.ipynb
report.pdf		report.pdf
text-classification-with-arabert.ipynb		text-classification-with-arabert.ipynb
text_classification_with_AraBert.ipynb		text_classification_with_AraBert.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring the Impact of the Attention Mechanism in Arabic Text Classification with AraBert

About

Research Report

Dataset

Models

Notebooks

Files

License

Acknowledgments

Contact

About

Releases

Packages

Languages

yuguerten/Exploring-the-impact-of-contextual-attention-on-Arabic-text-classification

Folders and files

Latest commit

History

Repository files navigation

Exploring the Impact of the Attention Mechanism in Arabic Text Classification with AraBert

About

Research Report

Dataset

Models

Notebooks

Files

License

Acknowledgments

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages