Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory #66

Open
SerhiiPostupaiev opened this issue Apr 27, 2023 · 1 comment
Open

CUDA out of memory #66

SerhiiPostupaiev opened this issue Apr 27, 2023 · 1 comment

Comments

@SerhiiPostupaiev
Copy link

SerhiiPostupaiev commented Apr 27, 2023

Hello, @Paper99, @LGYoung, @NK-CS-ZZL
I am trying to launch the evaluation script using CUDA GPU.

I ensured my PC has GPU enabled

>>> import torch

>>> torch.cuda.is_available()
True

>>> torch.cuda.device_count()
1

>>> torch.cuda.current_device()
0

>>> torch.cuda.device(0)
<torch.cuda.device at 0x7efce0b03be0>

>>> torch.cuda.get_device_name(0)
'GeForce GTX 950M'

I am using Windows 11
image

When I run the evaluation script, the following error is received

(ttt) F:\training\E2FGVI>python evaluate.py --model e2fgvi --dataset davis --data_root datasets/ --ckpt release_model/E2FGVI-CVPR22.pth
C:\Users\postu\miniconda3\envs\ttt\lib\site-packages\mmcv\__init__.py:21: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  'On January 1, 2023, MMCV will release v2.0.0, in which it will remove '
load pretrained SPyNet...
load checkpoint from http path: https://download.openmmlab.com/mmediting/restorers/basicvsr/spynet_20210409-c6c1bd09.pthLoading from: release_model/E2FGVI-CVPR22.pth
Start evaluation...
[Loading I3D model from ./release_model/i3d_rgb_imagenet.pt for FID score ..]
Traceback (most recent call last):
  File "evaluate.py", line 176, in <module>
    main_worker(args)
  File "evaluate.py", line 92, in main_worker
    pred_img, _ = model(masked_frames, len(neighbor_ids))
  File "C:\Users\postu\miniconda3\envs\ttt\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "F:\training\E2FGVI\model\e2fgvi.py", line 255, in forward
    trans_feat = self.transformer(trans_feat)
  File "C:\Users\postu\miniconda3\envs\ttt\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\postu\miniconda3\envs\ttt\lib\site-packages\torch\nn\modules\container.py", line 100, in forward
    input = module(input)
  File "C:\Users\postu\miniconda3\envs\ttt\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "F:\training\E2FGVI\model\modules\tfocal_transformer.py", line 523, in forward
    mask_all=x_window_masks_all)  # nW*B, T*window_size*window_size, C
  File "C:\Users\postu\miniconda3\envs\ttt\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "F:\training\E2FGVI\model\modules\tfocal_transformer.py", line 394, in forward
    attn = self.softmax(attn)
  File "C:\Users\postu\miniconda3\envs\ttt\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\postu\miniconda3\envs\ttt\lib\site-packages\torch\nn\modules\activation.py", line 1044, in forward
    return F.softmax(input, self.dim, _stacklevel=5)
  File "C:\Users\postu\miniconda3\envs\ttt\lib\site-packages\torch\nn\functional.py", line 1442, in softmax
    ret = input.softmax(dim)
RuntimeError: CUDA out of memory. Tried to allocate 668.00 MiB (GPU 0; 4.00 GiB total capacity; 2.36 GiB already allocated; 0 bytes free; 3.05 GiB reserved in total by PyTorch)

Is there something that can be done here?
Or is my PC hardware is too weak to launch the evaluation process?

@cmn1565080456
Copy link

This problem may be that your hardware performance is not enough to support the minimum performance requirements for training. You should be able to solve this problem by upgrading your graphics card and increasing the memory capacity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants