Skip to content

Creating submission on Kaggle

Barata Magnus edited this page Oct 8, 2021 · 2 revisions

This page will guide you on how to use this repository and create a submission on Kaggle.

Preparation

  1. Stash/commit your work on the current branch (:warning: You will lose your current work if you skip this step!). Also, if the experiment to be submitted is on a different unmerged branch, it is recommended to checkout the branch containing the experiment.
  2. Use the script repo2kaggle.sh on the root directory to create Kaggle dataset. The output will be located at kaggle_dataset directory.
    script/repo2kaggle.sh # Latest changes will be used
    script/repo2kaggle.sh EXP-000 # Use the repo state on EXP-000
    Note that when using the script, if the experiment result (trained models, etc) is available at exps directory, it will be automatically added to the kaggle_dataset directory.
  3. Upload kaggle_dataset directory to Kaggle as a custom dataset. If Remove Duplicates warning showed up, ignore the message and upload including duplicates.

Installing additional packages

In some Kaggle code competitions, internet access is prohibited for submission kernel. This section will help you to overcome this kind of limitation.

  1. Create a new notebook.
  2. Download additional packages
    !touch requirements.txt
    !echo "pkg1==version" >> requirements.txt
    !echo "pkg2==version" >> requirements.txt
    ...
    
    !pip download -r requirements.txt
  3. Save and commit your notebook.

Submitting

  1. Create a notebook with the needed dataset (competition dataset, uploaded custom dataset which includes this repo code and trained models, additional package notebook, etc).
  2. Install additional packages
    !pip install --no-index --find-links /kaggle/input/pkg-install-notebook -r requirements.txt
  3. Append the repo's path to python import paths.
    import sys
    sys.path.append('/kaggle/input/your-uploaded-dataset/kaggle_dataset/src')
    
    # From here you can import this repo's codes just like you would normally do on a local environment
    from datagens.vol_datagen import VolumeDatagen
    import utils
    ...
  4. Write your codes to do predictions on the competition's data.

A boilerplate for submission is provided below. Modify this boilerplate to match your needs.

# %%
import os
import sys
import pandas as pd
import tensorflow as tf

DATA_DIR = '/kaggle/input/rsna-miccai-brain-tumor-radiogenomic-classification'
EXP_DIR = '/kaggle/input/your-uploaded-dataset/kaggle_dataset'
CASES = sorted([f.name for f in os.scandir(f'{DATA_DIR}/test') if f.is_dir()])

# Adding custom package
sys.path.append(f'{EXP_DIR}/src')

# %%
from datagens.vol_datagen import VolumeDatagen
import utils

params = utils.Hyperparams(f'{EXP_DIR}/train_params.json')
if params.ensemble:
    seq_types = ['FLAIR', 'T1w', 'T1wCE', 'T2w']
else:
    seq_types = [params.data.seq_type]

preds = []
for seq_type in seq_types:
    print(f'==========  Predicting {seq_type}  ==========')
    params.data.seq_type = seq_type
    exp_dir = EXP_DIR
    if params.ensemble:
        exp_dir += f'/{seq_type}'
    
    datagen = VolumeDatagen(
        CASES,
        batch_size=params.data.batch_size,
        volume_size=params.data.volume_size,
        seq_type=params.data.seq_type,
        datadir=DATA_DIR,
        shuffle=False
    )
    model = tf.keras.models.load_model(f'{exp_dir}/model_best.h5')
    preds.append(model.predict(datagen, verbose=1))

# %%
# Final prediction
preds_final = sum(preds) / len(preds)
pd.DataFrame({'BraTS21ID': CASES, 'MGMT_value': preds_final[:,1]}).to_csv('/kaggle/working/submission.csv', index=False)
Clone this wiki locally