You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am a master with strong interest in autonomous driving. I want to reproduce your model, no, because I want to run through your model code first, and try to use part of the LMDrive data. I plan to use Town01, Town02 for training and Town03 for verification. After I set the number of Gpus in train.sh and the path of data set, the execution of train.sh did not have normal training, and the results are as follows:
(interfuser) zdy@zhaojh:~/InterFuser-main/interfuser/scripts$ bash train.sh
/home/zdy/anaconda3/envs/interfuser/lib/python3.7/site-packages/torch/distributed/launch.py:188: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
FutureWarning,
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
Added key: store_based_barrier_key:1 to store for rank: 0
Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
Training in distributed mode with multiple processes, 1 GPU per process. Process 0, total 2.
Added key: store_based_barrier_key:1 to store for rank: 1
Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
Training in distributed mode with multiple processes, 1 GPU per process. Process 1, total 2.
Loading pretrained weights from url (https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/resnet50d_ra2-464e36ba.pth)
Loading pretrained weights from url (https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/resnet50d_ra2-464e36ba.pth)
Model interfuser_baseline created, param count:52935567
Data processing configuration for current model + dataset:
input_size: (3, 224, 224)
interpolation: bicubic
mean: (0.485, 0.456, 0.406)
std: (0.229, 0.224, 0.225)
crop_pct: 0.875
CNN backbone and transformer blocks using different learning rates!
165 weights in the cnn backbone, 274 weights in other modules
AMP not enabled. Training in float32.
Using native Torch DistributedDataParallel.
Sub route dir nums: 0
Scheduled epochs: 35
Sub route dir nums: 0
Sub route dir nums: 0
Sub route dir nums: 0
Current checkpoints:
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-0.pth.tar', 0)
Current checkpoints:
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-0.pth.tar', 0)
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-1.pth.tar', 0)
Current checkpoints:
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-0.pth.tar', 0)
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-1.pth.tar', 0)
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-2.pth.tar', 0)
Current checkpoints:
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-0.pth.tar', 0)
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-1.pth.tar', 0)
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-2.pth.tar', 0)
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-3.pth.tar', 0)
Current checkpoints:
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-0.pth.tar', 0)
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-1.pth.tar', 0)
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-2.pth.tar', 0)
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-3.pth.tar', 0)
('./output/20241028-100256-interfuser_baseline-224-interfuser_baseline/checkpoint-4.pth.tar', 0)
*** Best metric: 0 (epoch 0)
How can I solve this problem?Hope to get your recovery
The text was updated successfully, but these errors were encountered:
Hello, I am a master with strong interest in autonomous driving. I want to reproduce your model, no, because I want to run through your model code first, and try to use part of the LMDrive data. I plan to use Town01, Town02 for training and Town03 for verification. After I set the number of Gpus in train.sh and the path of data set, the execution of train.sh did not have normal training, and the results are as follows:
How can I solve this problem?Hope to get your recovery
The text was updated successfully, but these errors were encountered: