-
Notifications
You must be signed in to change notification settings - Fork 0
Creating submission on Kaggle
Barata Magnus edited this page Oct 8, 2021
·
2 revisions
This page will guide you on how to use this repository and create a submission on Kaggle.
- Stash/commit your work on the current branch (:warning: You will lose your current work if you skip this step!). Also, if the experiment to be submitted is on a different unmerged branch, it is recommended to checkout the branch containing the experiment.
- Use the script
repo2kaggle.sh
on the root directory to create Kaggle dataset. The output will be located atkaggle_dataset
directory.Note that when using the script, if the experiment result (trained models, etc) is available atscript/repo2kaggle.sh # Latest changes will be used script/repo2kaggle.sh EXP-000 # Use the repo state on EXP-000
exps
directory, it will be automatically added to thekaggle_dataset
directory. - Upload
kaggle_dataset
directory to Kaggle as a custom dataset. IfRemove Duplicates
warning showed up, ignore the message and upload including duplicates.
In some Kaggle code competitions, internet access is prohibited for submission kernel. This section will help you to overcome this kind of limitation.
- Create a new notebook.
- Download additional packages
!touch requirements.txt !echo "pkg1==version" >> requirements.txt !echo "pkg2==version" >> requirements.txt ... !pip download -r requirements.txt
- Save and commit your notebook.
- Create a notebook with the needed dataset (competition dataset, uploaded custom dataset which includes this repo code and trained models, additional package notebook, etc).
- Install additional packages
!pip install --no-index --find-links /kaggle/input/pkg-install-notebook -r requirements.txt
- Append the repo's path to python import paths.
import sys sys.path.append('/kaggle/input/your-uploaded-dataset/kaggle_dataset/src') # From here you can import this repo's codes just like you would normally do on a local environment from datagens.vol_datagen import VolumeDatagen import utils ...
- Write your codes to do predictions on the competition's data.
A boilerplate for submission is provided below. Modify this boilerplate to match your needs.
# %%
import os
import sys
import pandas as pd
import tensorflow as tf
DATA_DIR = '/kaggle/input/rsna-miccai-brain-tumor-radiogenomic-classification'
EXP_DIR = '/kaggle/input/your-uploaded-dataset/kaggle_dataset'
CASES = sorted([f.name for f in os.scandir(f'{DATA_DIR}/test') if f.is_dir()])
# Adding custom package
sys.path.append(f'{EXP_DIR}/src')
# %%
from datagens.vol_datagen import VolumeDatagen
import utils
params = utils.Hyperparams(f'{EXP_DIR}/train_params.json')
if params.ensemble:
seq_types = ['FLAIR', 'T1w', 'T1wCE', 'T2w']
else:
seq_types = [params.data.seq_type]
preds = []
for seq_type in seq_types:
print(f'========== Predicting {seq_type} ==========')
params.data.seq_type = seq_type
exp_dir = EXP_DIR
if params.ensemble:
exp_dir += f'/{seq_type}'
datagen = VolumeDatagen(
CASES,
batch_size=params.data.batch_size,
volume_size=params.data.volume_size,
seq_type=params.data.seq_type,
datadir=DATA_DIR,
shuffle=False
)
model = tf.keras.models.load_model(f'{exp_dir}/model_best.h5')
preds.append(model.predict(datagen, verbose=1))
# %%
# Final prediction
preds_final = sum(preds) / len(preds)
pd.DataFrame({'BraTS21ID': CASES, 'MGMT_value': preds_final[:,1]}).to_csv('/kaggle/working/submission.csv', index=False)