- MLP
- ResNet18
- MLP-Mixer
python main.py --model <model_name> --model_config <path_to_json> --logdir <result_dir> ...
Please see config.py for other command line usage.
python main.py --batch_size 128 --model mlp --model_config ./model_configs/mlp.json --epochs 100 --logdir ./train_log/mlp --device cuda
python main.py --batch_size 128 --model resnet18 --model_config ./model_configs/resnet18.json --epochs 100 --optimizer adam --lr 0.001 --logdir ./train_log/resnet --device cuda
python main.py --batch_size 128 --model mlpmixer --model_config ./model_configs/mlpmixer.json --epochs 15 --lr 0.1 --logdir ./train_log/mixer --device cuda
Comparison of the best results for the three models:
Hyperparameters used by the above models:
Models | Hyperparameters |
---|---|
MLP | default |
ResNet18 | batch size=64, weight decay=1e-3, number of epochs=100 |
MLP-Mixer | learning rate=1e-3, embedding dimension=512, number of blocks=6, number of epochs=84 |
First normalized all the weights of the kernels of the first layer of ResNet18 (map them between 0 and 1), then averaged each
The first layer of token-mixing MLP in the first block of MLPMixer has a dimension of
We can see that the first layer of token-mixing MLP is the same as the first layer of ResNet18 with many pairs of feature detectors with opposite phases. But since token-mixing allows global communication between different spatial locations, there will be some learned features that operate on the entire image, while others operate on smaller regions. In contrast, ResNet18 tends to only learn detectors that act on pixels in local regions of an image.