Missing icenet CLI training argument in template script #54

bnubald · 2024-08-29T14:45:14Z

IceNet version: v0.3.0_dev
Pipeline version: 0.3.0_dev

Getting an error when attempting training, the workers flag, -w seems to have been removed in icenet v0.3.0_dev which is causing this.

Pipeline script run:

./run_train_ensemble.sh -b $BATCH_SIZE -e 10 -f $FILTER_FACTOR -p $PREP_SCRIPT -q 4 ${TRAIN_DATA_NAME}_${HEMI} ${TRAIN_DATA_NAME}_${HEMI} ${FORECAST}_${HEMI}

Resulting in this error in the ensemble run:

usage: icenet_train_tensorflow [-h] [-o OUTPUT_PATH] [-v] [-b BATCH_SIZE]
                               [-ca CHECKPOINT_MODE] [-cm CHECKPOINT_MONITOR]
                               [-ds [ADDITIONAL ...]] [-e EPOCHS]
                               [--early-stopping EARLY_STOPPING] [-p PRELOAD]
                               [-r RATIO] [--shuffle-train] [--lr LR]
                               [--lr_10e_decay_fac LR_10E_DECAY_FAC]
                               [--lr_decay_start LR_DECAY_START]
                               [--lr_decay_end LR_DECAY_END] [-f FILTER_SIZE]
                               [-n N_FILTERS_FACTOR]
                               [-s {default,mirrored,central}] [-nw] [-wo]
                               [-wp WANDB_PROJECT] [-wu WANDB_USER]
                               dataset run_name seed
icenet_train_tensorflow: error: ambiguous option: -w could match -wo, -wp, -wu

@JimCircadian, looking through icenet v0.3.0_dev, removing -w seems intended, remove -w {{ run.ntasks }} in template here to fix?

https://github.com/icenet-ai/icenet-pipeline/blob/4f98699d819245fe1ed5d40d9c30b64d1efebfa5/ensemble/template/icenet_train.sh.j2#L40C171-L40C190

The text was updated successfully, but these errors were encountered:

JimCircadian · 2024-08-29T14:53:39Z

@bnubald I've marked the comment above and reported it as a phishing attempt.

The -w flag was for multiprocessing that wasn't needed / enabled for tf.data usage as I recall (it is applied to the model.fit call), so yes, removing that would work. Interesting that the template is still using icenet_train too, unless that's substituted? Apologies, this has gotten a bit stale in my brain now

bnubald · 2024-08-29T14:55:14Z

Yes, I've switched to icenet_train_tensorflow for now, and also picking up more missing args:

❯ t ensemble/torch_unet_south/torch_unet_south-0/train.6233459.node022.42.err 
                               [--early-stopping EARLY_STOPPING] [-p PRELOAD]
                               [-r RATIO] [--shuffle-train] [--lr LR]
                               [--lr_10e_decay_fac LR_10E_DECAY_FAC]
                               [--lr_decay_start LR_DECAY_START]
                               [--lr_decay_end LR_DECAY_END] [-f FILTER_SIZE]
                               [-n N_FILTERS_FACTOR]
                               [-s {default,mirrored,central}] [-nw] [-wo]
                               [-wp WANDB_PROJECT] [-wu WANDB_USER]
                               dataset run_name seed
icenet_train_tensorflow: error: unrecognized arguments: -m -qs 4

JimCircadian · 2024-08-29T15:13:51Z

Both related and can go @bnubald!

bnubald added the bug Something isn't working label Aug 29, 2024

bnubald added this to the v0.3.0 milestone Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing icenet CLI training argument in template script #54

Missing icenet CLI training argument in template script #54

bnubald commented Aug 29, 2024

JimCircadian commented Aug 29, 2024

bnubald commented Aug 29, 2024

JimCircadian commented Aug 29, 2024

Missing icenet CLI training argument in template script #54

Missing icenet CLI training argument in template script #54

Comments

bnubald commented Aug 29, 2024

JimCircadian commented Aug 29, 2024

bnubald commented Aug 29, 2024

JimCircadian commented Aug 29, 2024