This is the code used for "The Role of Pre-training Data in Transfer Learning".
Our CLIP models are trained from scratch on each of the pre-training datasets unless otherwise mentioned and follow the training code from the OpenCLIP GitHub repository. CLIP models are trained using AdamW optimizer with default PyTorch parameters
Our SimCLR implementation closely follows the training code from the SLIP
SimCLR models are also trained for 16 epochs from scratch using AdamW optimizer~\citep{loshchilov2017decoupled} with
Each pretrained model is finetuned on the specific downstream task for 128 epochs while the learning rate is from {0.0001, 0.0003, 0.001, 0.003} as starting and applying a cosine-annealing learning rate schedule with 500 steps warm-up and batch size of 128. For each fine-tuning, we choose the best performing result on the test set among the performed grid search. We use the implementation from the WiSE-FT GitHub repository for fine-tuning, where we have only one model and
conda env create
conda activate DataDisributionTransferLearning
cd DataDisributionTransferLearning
export PYTHONPATH="$PYTHONPATH:$PWD"
Most experiments in this repositoty were done using Caliban. Caliban is a tool for developing research workflow and notebooks in an isolated Docker environment and submitting those isolated environments to Google Compute Cloud. Basically you can use the commands in run.sh for different experiments. Each run will load the hyperparameters from config.json and save results in the Google Bucket. Below is a short step-by-step how to run Caliban on GCP:
- sudo apt-get install python3 python3-venv python3-pip
- sudo usermod -a -G docker ${USER}
- Install Docker: Note: check if docker is already installed: sudo apt-get install -y nvidia-docker2 If not continue: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker
- sudo pkill -SIGHUP dockerd
- python3 -m pip install --user pipx
- python3 -m pipx ensurepath
- source ~/.bashrc (or re-login for the PATH changes to take effect)
- pipx install caliban
To check if all is well, run caliban --help
- Give the account owner the name of the account: Go to vm details> API and identity management
Service account Add the Service account([email protected]) as an owner to the IAMadmin in google console.
-
Also add this to the bucket as storage object admin if you are using Google Bucket
-
gcloud init
- Select the account
- Set default zone to some zone e.g. europe-west4-a (number 14)
- Add the following lines to the end of “~/.bashrc” export REGION="your zone e.g. europe-west4 " export PROJECT_ID="your project ID"
source ~/.bashrc
Test your Environment: gcloud auth list
- Follow these steps to get a JSON file for credentials
- Move the json file to a path
- Add the following to the end of “~/.bashrc”: export GOOGLE_APPLICATION_CREDENTIALS=path to the JSON file
- source ~/.bashrc
Then you can either run caliban locally or on the cloud using GCP Training jobs