This repository contains source code to all the code used in the Paper [https://ieeexplore.ieee.org/document/9089487] and in the QICC fake news competition by Team FAR-NLP and Team AI Musketeers
For the Transformer-Based code, we used using Google Colab with a GPU accelerated instance. All transformers are based on HuggingFace's implementation.
The code is left as is from the competition with minor editing.
- Wissam Antoun [email protected] (Transformers-based models for fake news and News Domain identification, and News Domain comparison models)
- Fady Baly [email protected] (Feature Based Fake News and News Domain Detection Models)
- Rim Achour [email protected] (Feature Based Fake News and News Domain Detection Models)
- Amir Hussein [email protected] (Feature Based Fake News and News Domain Detection Models and feature importance)
Table 1: Fake News Detection Hyper-Parameters
Models | Hyper-Parameters |
---|---|
NB | smoothing parameter=10 |
SVM | penalty parameter=21, kernel= RBF |
RF | estimators=271 |
XGBoost | estimators= 10, learning rate=1, gamma=0.5 |
mBERT-base, XLNET-base, RoBERTa-base |
MAXSEQLN:128, LR:2e-5, BATCHSIZE:32, EPOCHS: up to 5 |
Table 2: News Domain Detection Hyper-Parameters
Models | Hyper-Parameters |
---|---|
TF-IDF | 1 to 4-gram and 1 to 6-gram |
CNN* | EMBSIZE:300, 2-stacked CNNs 256 kernels of size 5 64 kernels of size 5 dropout: 0.1 Epochs: max of 40 |
3CCNN* | EMBSIZE:300 512 kernels of size 3,4 and 5 dropout: 0.3 Epochs: max of 40 |
LSTM* | EMBSIZE:300 Hidden size: 300 Dropout: 0.05 Epochs: max of 40 |
GRU* | Same as LSTM |
Bi-LSTM* | Same as LSTM |
Bi-LSTM with attention* | Same as LSTM |
Bi-LSTM with attention** | Same as LSTM |
mBERT-base, XLNET-base, RoBERTa-base |
MAXSEQLN:128, LR:2e-5, BATCHSIZE:32, EPOCHS: up to 5 |
RMDL | EMBSIZE:50 MAXSEQUENCELENGTH : 500 MAXNBWORDS : 5000 Combination of (10 DNNs,10 RNNs,10 CNNs) epochs : 100 each dnn: default parameters except for maxnodesdnn : 512 rnn & cnn : default parameters Adam optimizer dropout : 0.07 |
- Transformers-Team_AI_Musketeers/
- Data_preperation.ipynb: Notebook used for dataset preperation for the transformer based models
- Fake_News_BERT_RoBERTa.ipynb and Fake_News_Model_XLNet.ipynb: Notebooks used for transformer model training on the Fake News Dataset on google colab.
- Fake_News_Models_with_BERTViz.ipynb: Testing the Bert Visualization tool for better interpretability
- News_Domain_BERT_RoBERTa_XLNet.ipynb: Notebook used training the transfomer models on the domain Identidication task on google colab.
- News_Domain_ML.ipynb: Notebook used for to evaluate other models on the news domain identification task
- modeling.py: Extended the functionality of the BERT, XLNET and RoBERTa to support multilabel classification
- multiutils.py: Data preperation and creating the fine-tunnning data for the multilabel classification
- utils.py: Data preperation and creating the fine-tunnning data for binary classification
- Feature_based-Team_FAR_NLP/
- fake_news/
- google_search.py: Google search API access code for search result extraction
- preprocessing.py: Preprocessing source code the fake news arcticles
- topic/
- get_entities.py: Google cloud Language API access code for extracting the entities from the text
- model.py: Pytorch-based Bi-LSTM with Attention Model
- train_topic.py: Training script
- fake_news/
- Twitter_bot.ipynb: Notebook with the preproccessing, and model used for the twitter bot detection task
Cite our paper as:
@INPROCEEDINGS{Antoun2020:State,
AUTHOR="Wissam Antoun and Fady Baly and Rim Achour and Amir Hussein and Hazem Hajj",
TITLE="State of the Art Models for Fake News Detection Tasks",
BOOKTITLE="2020 IEEE International Conference on Informatics, IoT, and Enabling
Technologies (ICIoT) (ICIoT'2020)",
ADDRESS=", Qatar",
DAYS=14,
MONTH=mar,
YEAR=2020,
}