Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory with batch_size 1 and 4GB VRAM #49

Open
fjodborg opened this issue Feb 22, 2021 · 0 comments
Open

Out of memory with batch_size 1 and 4GB VRAM #49

fjodborg opened this issue Feb 22, 2021 · 0 comments

Comments

@fjodborg
Copy link

fjodborg commented Feb 22, 2021

Hello, i have this problem where i at some point start running out of memory when running python train.py --threed_match_dir ~/dataset/threedmatch/ --batch_size 1.
At first i tried with batch_size 2 but it was too much for my gpu so i changed it to 1. After going through some thousands epochs i started getting "out of memory" errors like:

INFO - 2021-02-22 12:51:28,348 - trainer - Train Epoch: 1 [1440/7317], Current Loss: 1.157e+00 Pos: 0.365 Neg: 0.792	Data time: 0.0536, Train time: 0.5614, Iter time: 0.6150
Traceback (most recent call last):
  File "train.py", line 84, in <module>
    main(config)
  File "train.py", line 63, in main
    trainer.train()
  File "/home/f/repos/FCGF/lib/trainer.py", line 132, in train
    self._train_epoch(epoch)
  File "/home/f/repos/FCGF/lib/trainer.py", line 492, in _train_epoch
    self.config.batch_size)
  File "/home/f/repos/FCGF/lib/trainer.py", line 427, in contrastive_hardest_negative_loss
    D01 = pdist(posF0, subF1, dist_type='L2')
  File "/home/f/repos/FCGF/lib/metrics.py", line 24, in pdist
    D2 = torch.sum((A.unsqueeze(1) - B.unsqueeze(0)).pow(2), 2)
RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 3.82 GiB total capacity; 744.27 MiB already allocated; 43.38 MiB free; 814.00 MiB reserved in total by PyTorch)

Currently my system takes up 500MiB VRAM from my GTX 1650 (4GB) and the rest is used by pytorch. I'm running pytorch 1.7 in a python 3.7 conda enviroment and i tried running tried compiling minkowskiEngine for cuda 11.2 and 10.2 but both gave the same error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant