Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: unknown error #9

Open
Moltennn opened this issue Jun 30, 2022 · 3 comments
Open

RuntimeError: CUDA error: unknown error #9

Moltennn opened this issue Jun 30, 2022 · 3 comments

Comments

@Moltennn
Copy link

I can't figure why i'm getting this error

python sample.py --model_path finetune.pt --batch_size 1 --num_batches 1 --text "a cyberpunk girl with a scifi neuralink device on her head"

Using device: cuda:0
Traceback (most recent call last):
  File "sample.py", line 284, in <module>
    ldm.to(device)
  File "/home/moltenn/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 121, in to
    return super().to(*args, **kwargs)
  File "/home/moltenn/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 927, in to
    return self._apply(convert)
  File "/home/moltenn/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/home/moltenn/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/home/moltenn/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  [Previous line repeated 3 more times]
  File "/home/moltenn/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 602, in _apply
    param_applied = fn(param)
  File "/home/moltenn/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 925, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Trying to run with CUDA_LAUNCH_BLOCKING enabled

CUDA_LAUNCH_BLOCKING=1 python sample.py --model_path finetune.pt --batch_size 1 --num_batches 1 --text "a cyberpunk girl with a scifi neuralink device on her head"

Using device: cuda:0
Traceback (most recent call last):
  File "sample.py", line 284, in <module>
    ldm.to(device)
  File "/home/moltenn/anaconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 121, in to
    return super().to(*args, **kwargs)
  File "/home/moltenn/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 927, in to
    return self._apply(convert)
  File "/home/moltenn/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/home/moltenn/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/home/moltenn/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  [Previous line repeated 3 more times]
  File "/home/moltenn/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 602, in _apply
    param_applied = fn(param)
  File "/home/moltenn/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 925, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: unknown error

pip freeze

absl-py==1.1.0
aiohttp==3.8.1
aiosignal==1.2.0
albumentations==0.4.3
altair==4.2.0
antlr4-python3-runtime==4.8
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
asttokens==2.0.5
async-timeout==4.0.2
attrs==21.4.0
axial-positional-embedding==0.2.1
backcall==0.2.0
backports.zoneinfo==0.2.1
beautifulsoup4==4.11.1
bleach==5.0.1
blinker==1.4
blobfile==1.3.1
braceexpand==0.1.7
brotlipy @ file:///home/conda/feedstock_root/build_artifacts/brotlipy_1648854175163/work
cachetools==5.2.0
certifi==2022.6.15
cffi==1.15.0
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1655906222726/work
click==8.1.3
-e git+https://github.com/openai/CLIP.git@b46f5ac7587d2e1862f8b7b1573179d80dcdd620#egg=clip
commonmark==0.9.1
cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography_1652967113783/work
DALL-E==0.1
dalle-pytorch==1.6.4
debugpy==1.6.0
decorator==5.1.1
defusedxml==0.7.1
einops==0.4.1
entrypoints==0.4
executing==0.8.3
fastjsonschema==2.15.3
filelock==3.7.1
frozenlist==1.3.0
fsspec==2022.5.0
ftfy==6.1.1
future==0.18.2
gitdb==4.0.9
GitPython==3.1.27
google-auth==2.9.0
google-auth-oauthlib==0.4.6
grpcio==1.47.0
-e git+https://github.com/Jack000/glid-3-xl@a0b5be4b04378d4d4779240d3e0a599360c1a133#egg=guided_diffusion
idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1642433548627/work
imageio==2.9.0
imageio-ffmpeg==0.4.2
imgaug==0.2.6
importlib-metadata==4.12.0
importlib-resources==5.8.0
iniconfig==1.1.1
ipykernel==6.15.0
ipython==8.4.0
ipython-genutils==0.2.0
ipywidgets==7.7.1
jedi==0.18.1
Jinja2==3.1.2
joblib==1.1.0
jsonschema==4.6.1
jupyter-client==7.3.4
jupyter-core==4.10.0
jupyterlab-pygments==0.2.2
jupyterlab-widgets==1.1.1
-e git+https://github.com/CompVis/latent-diffusion.git@5a6571e384f9a9b492bbfaca594a2b00cad55279#egg=latent_diffusion
Markdown==3.3.7
MarkupSafe==2.1.1
matplotlib-inline==0.1.3
mistune==0.8.4
mkl-fft==1.3.1
mkl-random @ file:///tmp/build/80754af9/mkl_random_1626186064646/work
mkl-service==2.4.0
multidict==6.0.2
mypy==0.961
mypy-extensions==0.4.3
nbclient==0.6.5
nbconvert==6.5.0
nbformat==5.4.0
nest-asyncio==1.5.5
networkx==2.8.4
notebook==6.4.12
numpy @ file:///opt/conda/conda-bld/numpy_and_numpy_base_1654872176621/work
oauthlib==3.2.0
omegaconf==2.1.1
opencv-python==4.1.2.30
opencv-python-headless==4.6.0.66
packaging==21.3
pandas==1.4.3
pandocfilters==1.5.0
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.0.1
pluggy==1.0.0
prometheus-client==0.14.1
prompt-toolkit==3.0.30
protobuf==3.19.4
psutil==5.9.1
ptyprocess==0.7.0
pudb==2019.2
pure-eval==0.2.2
py==1.11.0
pyarrow==8.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1636257122734/work
pycryptodomex==3.15.0
pydeck==0.7.1
pyDeprecate==0.3.2
Pygments==2.12.0
Pympler==1.0.1
pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1643496850550/work
pyparsing==3.0.9
pyrsistent==0.18.1
PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1648857275402/work
pytest==7.1.2
python-dateutil==2.8.2
pytorch-lightning==1.6.4
pytz==2022.1
pytz-deprecation-shim==0.1.0.post0
PyWavelets==1.3.0
PyYAML==6.0
pyzmq==23.2.0
regex==2022.6.2
requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1656534056640/work
requests-oauthlib==1.3.1
rich==12.4.4
rotary-embedding-torch==0.1.5
rsa==4.8
sacremoses==0.0.53
scikit-image==0.19.3
scipy==1.8.1
semver==2.13.0
Send2Trash==1.8.0
six @ file:///tmp/build/80754af9/six_1644875935023/work
smmap==5.0.0
soupsieve==2.3.2.post1
stack-data==0.3.0
streamlit==1.10.0
-e git+https://github.com/CompVis/taming-transformers.git@24268930bf1dce879235a7fddd0b2355b84d7ea6#egg=taming_transformers
taming-transformers-rom1504==0.0.6
tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
terminado==0.15.0
test-tube==0.7.5
tifffile==2022.5.4
tinycss2==1.1.1
tokenizers==0.10.3
toml==0.10.2
tomli==2.0.1
toolz==0.11.2
torch==1.12.0
torch-fidelity==0.3.0
torchaudio==0.12.0
torchmetrics==0.9.2
torchvision==0.13.0
tornado==6.1
tqdm==4.64.0
traitlets==5.3.0
transformers==4.3.1
typing-extensions @ file:///opt/conda/conda-bld/typing_extensions_1647553014482/work
tzdata==2022.1
tzlocal==4.2
urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1647489083693/work
urwid==2.1.2
validators==0.20.0
watchdog==2.1.9
wcwidth==0.2.5
webdataset==0.2.5
webencodings==0.5.1
Werkzeug==2.1.2
widgetsnbextension==3.6.1
xmltodict==0.12.0
yarl==1.7.2
youtokentome==1.0.6
zipp==3.8.0
@limiteinductive
Copy link
Contributor

your ldm model should not be using pytorch-lightning to load... try unstalling pytorch-lightning maybe

@Moltennn
Copy link
Author

That didn't work. It just said something like "missing module pytorch-lightning"
Anyway i tried to purge the whole container or w/e those are called and reinstalling. Well no success there either.
This whole shenanigan was done on wsl ubuntu.

So i decided to install everything on windows.
And got this error. Guess my poor old gtx 970 isn't fit for this :D

python sample.py --model_path finetune.pt --batch_size 1 --num_batches 1 --text "a cyberpunk girl with a scifi neuralink device on her head"

Using device: cuda:0
Traceback (most recent call last):
  File "sample.py", line 284, in <module>
    ldm.to(device)
  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\core\mixins\device_dtype_mixin.py", line 111, in to
    return super().to(*args, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 927, in to
    return self._apply(convert)
  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
    module._apply(fn)
  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
    module._apply(fn)
  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
    module._apply(fn)
  [Previous line repeated 3 more times]
  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 602, in _apply
    param_applied = fn(param)
  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 925, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.47 GiB already allocated; 0 bytes free; 3.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Then i tried with --cpu parameter to see how it'd go..

python sample.py --cpu --model_path finetune.pt --batch_size 1 --num_batches 1 --text "a cyberpunk girl with a scifi neuralink device on her head"

Using device: cpu
Traceback (most recent call last):
  File "sample.py", line 522, in <module>
    do_run()
  File "sample.py", line 307, in do_run
    text_emb = bert.encode([args.text]*args.batch_size).to(device).float()
  File "C:\Users\Administrator\txt2img\glid-3-xl\encoders\modules.py", line 99, in encode
    return self(text)
  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Administrator\txt2img\glid-3-xl\encoders\modules.py", line 94, in forward
    z = self.transformer(tokens, return_embeddings=True)
  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Administrator\txt2img\glid-3-xl\encoders\x_transformer.py", line 609, in forward
    x = self.token_emb(x)
  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\sparse.py", line 158, in forward
    return F.embedding(
  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\nn\functional.py", line 2199, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper__index_select)

@Sylvainsbrr
Copy link

Sylvainsbrr commented Jul 3, 2022

Its not your GPU i have same issue with 3090.
This versions resolved the issue :
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants