Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce the results with the provided model weights and code on LEVIER dataset. #5

Open
Ujjwal238 opened this issue Dec 26, 2024 · 5 comments

Comments

@Ujjwal238
Copy link

To Reproduce the error

Steps to reproduce the behavior:

  1. added the given weights in the checkpoints/elgcnet_levir/ as best_checkpoint.pt
  2. run eval_cd.py

Error received

  File "/content/elgcnet/eval_cd.py", line 55, in main
    model.eval_models(checkpoint_name=args.checkpoint_name)
  File "/content/elgcnet/models/evaluator.py", line 178, in eval_models
    self._load_checkpoint(checkpoint_name)
  File "/content/elgcnet/models/evaluator.py", line 66, in _load_checkpoint
    checkpoint = torch.load(os.path.join(self.checkpoint_dir, checkpoint_name), map_location='cpu')
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x27'```
@techmn
Copy link
Owner

techmn commented Dec 26, 2024

  1. Make sure that you are using the correct file path.
  2. Its a common practice to look at the state dict and its keys before loading the model to avoid the error due to key mismatch problem.

Go to evaluator.py and look at the _load_checkpoint function. Update your function to correctly load the model weights.

def _load_checkpoint(self, checkpoint_name='best_ckpt.pt'):

        if os.path.exists(os.path.join(self.checkpoint_dir, checkpoint_name)):
            self.logger.write('loading last checkpoint...\n')
            # load the entire checkpoint
            checkpoint = torch.load(os.path.join(self.checkpoint_dir, checkpoint_name), map_location='cpu')
            
            if isinstance(self.net_G, torch.nn.DataParallel):
                msg = self.net_G.module.load_state_dict(checkpoint) #['model_G_state_dict'])
            else:
                msg = self.net_G.load_state_dict(checkpoint) #['model_G_state_dict'])
            print(msg)
            
            self.net_G.to(self.device)
            self.logger.write('\n')

Hope it works!

@Ujjwal238
Copy link
Author

i tried doing the mentioned changed but still received the same error as before. Please look into it.

@techmn
Copy link
Owner

techmn commented Dec 27, 2024

There might be problem with with your downloaded files or the libraries you are using.
I am not getting any error. Here is the log

checkpoint_name: elgcnet_levir_ckpt.pt checkpoint_dir: ./elgcnet_levir
<All keys matched successfully>

Begin evaluation...
Is_training: False. [1,2048],  running_mf1: 0.96269
Is_training: False. [101,2048],  running_mf1: 0.50000
Is_training: False. [201,2048],  running_mf1: 0.97121
Is_training: False. [301,2048],  running_mf1: 0.99011
Is_training: False. [401,2048],  running_mf1: 0.50000
Is_training: False. [501,2048],  running_mf1: 0.50000
Is_training: False. [601,2048],  running_mf1: 0.50000
Is_training: False. [701,2048],  running_mf1: 0.88616
Is_training: False. [801,2048],  running_mf1: 0.92438
Is_training: False. [901,2048],  running_mf1: 0.89096
Is_training: False. [1001,2048],  running_mf1: 0.97063
Is_training: False. [1101,2048],  running_mf1: 0.92529
Is_training: False. [1201,2048],  running_mf1: 0.49982
Is_training: False. [1301,2048],  running_mf1: 0.88886
Is_training: False. [1401,2048],  running_mf1: 0.50000
Is_training: False. [1501,2048],  running_mf1: 0.97076
Is_training: False. [1601,2048],  running_mf1: 0.97004
Is_training: False. [1701,2048],  running_mf1: 0.49924
Is_training: False. [1801,2048],  running_mf1: 0.50000
Is_training: False. [1901,2048],  running_mf1: 0.49871
Is_training: False. [2001,2048],  running_mf1: 0.98605
acc: 0.99118 miou: 0.91450 mf1: 0.95368 iou_0: 0.99076 iou_1: 0.83825 F1_0: 0.99536 F1_1: 0.91201

You can get help from here.

@Ujjwal238
Copy link
Author

I went through the link you provided. I speculate that The .pt file that i have received after training is of 127 MB while the .pt file uploaded is mere 44MB , i believe there's some mismatch somewhere.

@Ujjwal238
Copy link
Author

could you please share the precision and recall of the pre and post change class well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants