You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am doing tumor segmentation task on image dataset BraTS 2021, using the Swin UNETR model. For now, I am using just 5 samples. Images are 4 modalities, 3D (240x240x155). Trying to run following code without any changes:
[https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/swin_unetr_brats21_segmentation_3d.ipynb]
I am trying to run it on a GPU having the following properties:
OS Name: Ubuntu 20.04.6 LTS
Processor: Intel® Xeon(R) CPU E5504 @ 2.00GHz × 4
Graphics: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti]
11 GB RAM
First, I was getting following error, with CUDA version:12.2, torch version:1.7+cu110 :
RuntimeError: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Then I reinstalled CUDA version 11.0, shown by:
nvcc --version:
(base) dlrs@spml3:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
nvidia-smi:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10 Driver Version: 535.86.10 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1080 Ti On | 00000000:03:00.0 On | N/A |
| 0% 35C P8 19W / 275W | 238MiB / 11264MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 803 G /usr/lib/xorg/Xorg 128MiB |
| 0 N/A N/A 1106 G /usr/bin/gnome-shell 20MiB |
| 0 N/A N/A 1171 G /opt/teamviewer/tv_bin/TeamViewer 2MiB |
| 0 N/A N/A 2364 G /usr/lib/firefox/firefox 13MiB |
| 0 N/A N/A 4646 G ...sion,SpareRendererForSitePerProcess 35MiB |
| 0 N/A N/A 15064 G ...959815738,826468481227333041,262144 31MiB |
| 0 N/A N/A 55236 G gnome-control-center 2MiB |
+---------------------------------------------------------------------------------------+
Along with:
torch version: 1.7.1+cu110
python==3.8
monai==1.00
(I run it by just adding paths to my dataset), I got following error:
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.
With changes in the model as follows:
roi = (64, 64, 64) (from(128,128,128))
batch_size = 1 (from 2 to 1)
sw_batch_size = 1 (from 4 to 1)
fold = 1
infer_overlap = 0.5
max_epochs = 4 (from 100 to 4)
val_every = 2 (from 10 to 2)
When I ran above code, it became unresponsive
Then I cleared cache by (torch.cuda.empty_cache()), and run the script again followinfg error was encountered:
/tmp/tmp7q0y5gok
Fri Aug 11 07:06:10 2023 Epoch: 0
Epoch 0/4 0/4 loss: 0.9950 time 11.59s
Epoch 0/4 1/4 loss: 0.9968 time 0.43s
Epoch 0/4 2/4 loss: 0.9979 time 0.43s
Epoch 0/4 3/4 loss: 0.9984 time 0.43s
Final training 0/3 loss: 0.9984 time 13.05s
None of the inputs have requires_grad=True. Gradients will be None
Traceback (most recent call last):
File "notebook_of_swin_unetr.py", line 429, in
) = trainer(
File "notebook_of_swin_unetr.py", line 363, in trainer
val_acc = val_epoch(
File "notebook_of_swin_unetr.py", line 289, in val_epoch
logits = model_inferer(data)
File "/home/dlrs/.local/lib/python3.8/site-packages/monai/inferers/utils.py", line 180, in sliding_window_inference
seg_prob_out = predictor(window_data, *args, **kwargs) # batched patch segmentation
File "/home/dlrs/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/dlrs/.local/lib/python3.8/site-packages/monai/networks/nets/swin_unetr.py", line 297, in forward
hidden_states_out = self.swinViT(x_in, self.normalize)
File "/home/dlrs/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/dlrs/.local/lib/python3.8/site-packages/monai/networks/nets/swin_unetr.py", line 1017, in forward
x4 = self.layers40
File "/home/dlrs/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/dlrs/.local/lib/python3.8/site-packages/monai/networks/nets/swin_unetr.py", line 874, in forward
attn_mask = compute_mask([dp, hp, wp], window_size, shift_size, x.device)
File "/home/dlrs/.local/lib/python3.8/site-packages/monai/networks/nets/swin_unetr.py", line 779, in compute_mask
img_mask[:, d, h, w, :] = cnt RuntimeError: CUDA error: an illegal instruction was encountered
and again it became unresponsive. I restarted terminal :
(aug10) dlrs@spml3:~/Desktop/jul_25$ python -c 'import monai; monai.config.print_debug_info()'
"sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to pytorch/audio#903 for the detail.
(I run it by just adding paths to my dataset), I got following error
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
There are several factors that could cause this error.
Could you please try the tutorial without changing the data path and see if it can work properly?
If that works, could you please check your data first and see if they have all the proper shapes and labels?
Thanks!
(I run it by just adding paths to my dataset), I got following error
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
There are several factors that could cause this error. Could you please try the tutorial without changing the data path and see if it can work properly? If that works, could you please check your data first and see if they have all the proper shapes and labels? Thanks!
Sir, BraTS Dataset can't be accessed directly on google colab. We have to download it from kaggle or synapse (after permission in case of 2023 dataset). So its necessary to provide and modify dataset path manually.
Modality: MRI Size: 1470 3D volumes (1251 Training + 219 Validation)
In 1251 training samples each has 4 3D modalities and 1 3D segmentation mask in it.(1251*5 = 6255 total images)
@Mgithus the error means that here it goes out of bounds :
File "/home/dlrs/.local/lib/python3.8/site-packages/monai/networks/nets/swin_unetr.py", line 779, in compute_mask
img_mask[:, d, h, w, :] = cnt
RuntimeError: CUDA error: an illegal instruction was encountered
You can have more comprehensive error if you run everything on CPU, the error may be more explicit like N is out of bounds.
I am doing tumor segmentation task on image dataset BraTS 2021, using the Swin UNETR model. For now, I am using just 5 samples. Images are 4 modalities, 3D (240x240x155). Trying to run following code without any changes:
[https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/swin_unetr_brats21_segmentation_3d.ipynb]
I am trying to run it on a GPU having the following properties:
OS Name: Ubuntu 20.04.6 LTS
Processor: Intel® Xeon(R) CPU E5504 @ 2.00GHz × 4
Graphics: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti]
11 GB RAM
First, I was getting following error, with CUDA version:12.2, torch version:1.7+cu110 :
RuntimeError: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Then I reinstalled CUDA version 11.0, shown by:
nvcc --version:
(base) dlrs@spml3:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
nvidia-smi:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10 Driver Version: 535.86.10 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1080 Ti On | 00000000:03:00.0 On | N/A |
| 0% 35C P8 19W / 275W | 238MiB / 11264MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 803 G /usr/lib/xorg/Xorg 128MiB |
| 0 N/A N/A 1106 G /usr/bin/gnome-shell 20MiB |
| 0 N/A N/A 1171 G /opt/teamviewer/tv_bin/TeamViewer 2MiB |
| 0 N/A N/A 2364 G /usr/lib/firefox/firefox 13MiB |
| 0 N/A N/A 4646 G ...sion,SpareRendererForSitePerProcess 35MiB |
| 0 N/A N/A 15064 G ...959815738,826468481227333041,262144 31MiB |
| 0 N/A N/A 55236 G gnome-control-center 2MiB |
+---------------------------------------------------------------------------------------+
Along with:
torch version: 1.7.1+cu110
python==3.8
monai==1.00
(I run it by just adding paths to my dataset), I got following error:
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.
import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([2, 48, 128, 128, 128], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv3d(48, 48, kernel_size=[3, 3, 3], padding=[1, 1, 1], stride=[1, 1, 1], dilation=[1, 1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
ConvolutionParams
data_type = CUDNN_DATA_FLOAT
padding = [1, 1, 1]
stride = [1, 1, 1]
dilation = [1, 1, 1]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 0x6c7e490
type = CUDNN_DATA_FLOAT
nbDims = 5
dimA = 2, 48, 128, 128, 128,
strideA = 100663296, 2097152, 16384, 128, 1,
output: TensorDescriptor 0xa210820
type = CUDNN_DATA_FLOAT
nbDims = 5
dimA = 2, 48, 128, 128, 128,
strideA = 100663296, 2097152, 16384, 128, 1,
weight: FilterDescriptor 0x6a8c150
type = CUDNN_DATA_FLOAT
tensor_format = CUDNN_TENSOR_NCHW
nbDims = 5
dimA = 48, 48, 3, 3, 3,
Pointer addresses:
input: 0x7fbb38000000
output: 0x7fbb68000000
weight: 0x7fbd159a9600
With changes in the model as follows:
roi = (64, 64, 64) (from(128,128,128))
batch_size = 1 (from 2 to 1)
sw_batch_size = 1 (from 4 to 1)
fold = 1
infer_overlap = 0.5
max_epochs = 4 (from 100 to 4)
val_every = 2 (from 10 to 2)
When I ran above code, it became unresponsive
Then I cleared cache by (torch.cuda.empty_cache()), and run the script again followinfg error was encountered:
/tmp/tmp7q0y5gok
Fri Aug 11 07:06:10 2023 Epoch: 0
Epoch 0/4 0/4 loss: 0.9950 time 11.59s
Epoch 0/4 1/4 loss: 0.9968 time 0.43s
Epoch 0/4 2/4 loss: 0.9979 time 0.43s
Epoch 0/4 3/4 loss: 0.9984 time 0.43s
Final training 0/3 loss: 0.9984 time 13.05s
None of the inputs have requires_grad=True. Gradients will be None
Traceback (most recent call last):
File "notebook_of_swin_unetr.py", line 429, in
) = trainer(
File "notebook_of_swin_unetr.py", line 363, in trainer
val_acc = val_epoch(
File "notebook_of_swin_unetr.py", line 289, in val_epoch
logits = model_inferer(data)
File "/home/dlrs/.local/lib/python3.8/site-packages/monai/inferers/utils.py", line 180, in sliding_window_inference
seg_prob_out = predictor(window_data, *args, **kwargs) # batched patch segmentation
File "/home/dlrs/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/dlrs/.local/lib/python3.8/site-packages/monai/networks/nets/swin_unetr.py", line 297, in forward
hidden_states_out = self.swinViT(x_in, self.normalize)
File "/home/dlrs/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/dlrs/.local/lib/python3.8/site-packages/monai/networks/nets/swin_unetr.py", line 1017, in forward
x4 = self.layers40
File "/home/dlrs/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/dlrs/.local/lib/python3.8/site-packages/monai/networks/nets/swin_unetr.py", line 874, in forward
attn_mask = compute_mask([dp, hp, wp], window_size, shift_size, x.device)
File "/home/dlrs/.local/lib/python3.8/site-packages/monai/networks/nets/swin_unetr.py", line 779, in compute_mask
img_mask[:, d, h, w, :] = cnt
RuntimeError: CUDA error: an illegal instruction was encountered
and again it became unresponsive. I restarted terminal :
(aug10) dlrs@spml3:~/Desktop/jul_25$ python -c 'import monai; monai.config.print_debug_info()'
"sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to pytorch/audio#903 for the detail.
Printing MONAI config...
MONAI version: 1.0.0
Numpy version: 1.21.6
Pytorch version: 1.7.1+cu110
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: 1700933
MONAI file: /home/dlrs/.local/lib/python3.8/site-packages/monai/init.py
Optional dependencies:
Pytorch Ignite version: 0.4.8
Nibabel version: 5.1.0
scikit-image version: 0.21.0
Pillow version: 10.0.0
Tensorboard version: 2.14.0
gdown version: 4.7.1
TorchVision version: 0.8.2+cu110
tqdm version: 4.66.1
lmdb version: 1.4.1
psutil version: 5.9.5
pandas version: 2.0.3
einops version: 0.6.1
transformers version: 4.31.0
mlflow version: 2.5.0
pynrrd version: 1.0.0
For details about installing the optional dependencies, please visit:
https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
================================
Printing system config...
System: Linux
Linux version: Ubuntu 20.04.6 LTS
Platform: Linux-5.15.0-78-generic-x86_64-with-glibc2.17
Processor: x86_64
Machine: x86_64
Python version: 3.8.17
Process name: python
Command: ['python', '-c', 'import monai; monai.config.print_debug_info()']
Open files: [popenfile(path='/home/dlrs/.anaconda/navigator/Code/logs/20230811T061228/ptyhost.log', fd=39, position=0, mode='a', flags=33793), popenfile(path='/snap/code/136/usr/share/code/resources/app/node_modules.asar', fd=41, position=64064, mode='r', flags=32768), popenfile(path='/snap/code/136/usr/share/code/v8_context_snapshot.bin', fd=103, position=0, mode='r', flags=32768)]
Num physical CPUs: 4
Num logical CPUs: 4
Num usable CPUs: 4
CPU usage (%): [100.0, 30.4, 35.8, 78.6]
CPU freq. (MHz): 1995
Load avg. in last 1, 5, 15 mins (%): [54.5, 78.2, 69.5]
Disk usage (%): 41.9
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 15.6
Available memory (GB): 6.6
Used memory (GB): 8.3
================================
Printing GPU config...
Num GPUs: 1
Has CUDA: True
CUDA version: 11.0
cuDNN enabled: True
cuDNN version: 8005
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80']
GPU 0 Name: NVIDIA GeForce GTX 1080 Ti
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 28
GPU 0 Total memory (GB): 10.9
GPU 0 CUDA capability (maj.min): 6.1
The text was updated successfully, but these errors were encountered: