Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detected call of Ir_scheduler.step( before 'optimizer.step(). #37

Closed
zhibeiyou135 opened this issue Jan 18, 2024 · 5 comments
Closed

Detected call of Ir_scheduler.step( before 'optimizer.step(). #37

zhibeiyou135 opened this issue Jan 18, 2024 · 5 comments

Comments

@zhibeiyou135
Copy link

Detected call of Ir_scheduler.step( before 'optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order:'optimizer.step( before 'Ir_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

If the mentioned error occurs during training, how should it be resolved?

@magehrig
Copy link
Contributor

not really an issue since it only affects a single step (out of thousands)

@zhibeiyou135
Copy link
Author

Sure, thank you for your response. I encountered the following issue during the training process. After running the code for one epoch, it threw an error and stopped. The reason seems to be related to WandB (Weights and Biases). I want to disable it, but the get_ckpt_path function relies on WandB, and I'm unsure how to resolve this issue. Below is the information printed to the console after completing one training session:

User
Epoch 0: : 144434it [8:41:59, 4.61it/s, loss=2.71, v_num=c31u]creating index...
index created!aLoader 0: : 2342it [03:38, 10.69it/s]
Loading and preparing results...
DONE (t=0.16s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=3.05s).
Accumulating evaluation results...
DONE (t=0.91s).
Epoch 0: : 144434it [8:42:05, 4.61it/s, loss=2.71, v_num=c31u]Epoch 0, global step 142092: 'val/AP' reached 0.42226 (best 0.42226), saving model to '/home/pe/projects/yxl/RVT/RVT-master/RVT/s9ijc31u/checkpoints/epoch000step142092val_AP0.42.ckpt' as top 1
Error executing job with overrides: ['model=rnndet', 'dataset=gen1', 'dataset.path=/home/pe/projects/yxl/gen1', 'wandb.project_name=RVT', 'wandb.group_name=gen1', '+experiment/gen1=tiny.yaml', 'hardware.gpus=0', 'batch_size.train=8', 'batch_size.eval=8', 'hardware.num_workers.train=6', 'hardware.num_workers.eval=2']

@magehrig
Copy link
Contributor

that seems to be a different, unrelated issue (for which it is better to open a separate issue).
Which command did you use to execute this run?

@zhibeiyou135
Copy link
Author

I'm sorry; next time, I will create a new separate question. I executed the following command to run.

GPU_IDS=0
BATCH_SIZE_PER_GPU=8
TRAIN_WORKERS_PER_GPU=6
EVAL_WORKERS_PER_GPU=2
python train.py model=rnndet dataset=gen1 dataset.path=${DATA_DIR} wandb.project_name=RVT
wandb.group_name=gen1 +experiment/gen1="${MDL_CFG}.yaml" hardware.gpus=${GPU_IDS}
batch_size.train=${BATCH_SIZE_PER_GPU} batch_size.eval=${BATCH_SIZE_PER_GPU}
hardware.num_workers.train=${TRAIN_WORKERS_PER_GPU} hardware.num_workers.eval=${EVAL_WORKERS_PER_GPU}

@magehrig
Copy link
Contributor

closing this issue which is continued here: #39

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants