Code corrupted silently when import decord before torch #293

zhangbw17 · 2024-03-13T15:28:48Z

This issue occurs when I import decord before torch, and then place nn.Module on the GPU.

import decord
import torch

torch.nn.Linear(3, 3).cuda()

It corrupted silently using python3 debug.py, and reported Segmentation fault (core dumped) when running in terminal.
Instead, the following code runs well,

import torch
import decord

torch.nn.Linear(3, 3).cuda()

The text was updated successfully, but these errors were encountered:

YinAoXiong · 2024-04-30T14:10:49Z

same problem

tongda · 2024-05-13T12:23:47Z

same with me. versions:

python: 3.10.14
decord: 0.6.0
pytorch: 2.3.0

and cuda related libs installed by pip:
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105

Leojc · 2024-07-03T11:35:03Z

same problem.

python: 3.10.13
pytorch: 2.3.0
decord: 0.6.0

losehu · 2024-07-25T05:00:41Z

me too

python=3.10
pytorch=1.13.1
decord==0.6.0

ellemcfarlane · 2024-09-09T16:07:25Z

Also happens to me, specifically when

importing CPU version of decord before torch via import decord (have not tested with GPU version)
moving my model to gpu e.g. model.to(torch.device("cuda"), dtype=torch.float16)

Note1: this occurs even when just importing decord but not actually using it to do anything
Note2: it does not occur when moving the model to cpu e.g. model.to(torch.device("cpu"), dtype=torch.float16), so given that I'm using the cpu-version of decord, there might be a connection there, but regardless, this should not happen.

Issue fixed when: simply importing torch before decord

Specific log when running with huggingface accelerate:
subprocess.CalledProcessError: Command '['python3', 'train.py', <placeholder-args>]' died with <Signals.SIGSEGV: 11>.

without accelerate:
train_script.sh: line 3: 2131749 Segmentation fault (core dumped) python3 train.py <placeholder-args>

versions:

python 3.10.14
cuda 12.1
decord 0.6.0
torch 2.4.0
torchvision 0.19.0

a-r-r-o-w · 2024-09-28T08:14:46Z

Issue fixed when: simply importing torch before decord

Thank you for saving my time @ellemcfarlane! I was stuck on this for quite a bit - very weird/unexpected how this works

lhoestq · 2024-10-28T14:38:16Z

it also happened to me with duckdb, which needs to be imported before decord or it crashes on import

losehu mentioned this issue Jul 25, 2024

Training error Akaneqwq/360DVD#9

Open

XiaobingSuper mentioned this issue Nov 6, 2024

[Bug]: Segment fault when import decord before import vllm vllm-project/vllm#9993

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code corrupted silently when import decord before torch #293

Code corrupted silently when import decord before torch #293

zhangbw17 commented Mar 13, 2024

YinAoXiong commented Apr 30, 2024

tongda commented May 13, 2024

Leojc commented Jul 3, 2024

losehu commented Jul 25, 2024

ellemcfarlane commented Sep 9, 2024 •

edited

Loading

a-r-r-o-w commented Sep 28, 2024

lhoestq commented Oct 28, 2024

Code corrupted silently when import decord before torch #293

Code corrupted silently when import decord before torch #293

Comments

zhangbw17 commented Mar 13, 2024

YinAoXiong commented Apr 30, 2024

tongda commented May 13, 2024

Leojc commented Jul 3, 2024

losehu commented Jul 25, 2024

ellemcfarlane commented Sep 9, 2024 • edited Loading

a-r-r-o-w commented Sep 28, 2024

lhoestq commented Oct 28, 2024

ellemcfarlane commented Sep 9, 2024 •

edited

Loading