- Create a new conda environment with all required dependencies:
conda env create -f environment.yml
- Activate the environment:
conda activate mami-san
- Download dataset here and put all files in folder ./MAMI DATASET/
- Download ResNet weights here and put them in the main directory
- Download Bert weights here and put them in the main directory
- To run everything with one command, run
python main.py
. Alternatively go through it step by step as described below.
The model that was used for image classification is the pretrained Wide ResNet-50-2 model from “Wide Residual Networks”.
ResNet weights can be found here
It achieved the accuracy of 65.8% on the test set and 75.9% on the training set just after 3 epochs.
The model that was used for text classification is the pretrained cased Bidirectional Encoder Representations from Transformers (BERT), which achieves 53.8% accuracy on the test set after 3 epochs.
BERT weights can be found here
The combined accuracy of the two models on the whole test dataset is 62.8% and the macro averaged f1 score is 58.2.
To preprocess the dataset, run the following command once:
python read_dataset.py
The command unzips the data into the data
folder in the working folder (creates it if necessary), preprocesses the files and creates dataloaders for the training and testing.
Two files in the test and train folders are created under the name labels.csv
.
To run the training script for image classifier, execute:
python resnet.py
To run the training script for text classifier, execute:
python bert.py
To load the image classifier weights and do the evaluation for 4 random images from the test set, run the following command:
python vizualize.py
To load both classifiers and compute their accuracy, run:
python combined.py
Data preprocessing, exploration and visualization, as well as all the code from the scripts with outputs can be found in the Jupyter Notebook bert_resnet.ipynb