RuntimeError: Failed to fetch video idx 168596 from /data/k400/train/salsa_dancing/EY6MSW3zkr8_000048_000058.avi; after 99 trials #558

Christinepan881 · 2022-06-16T03:57:59Z

When I use the MViT config to run the code on K400 dataset, I just met the errors:
...
Failed to decode video idx 31483 from /data/k400/train/changing_oil/csJFMaPl9Og_000370_000380.avi; trial 3
Failed to decode video idx 138602 from /data/k400/train/playing_monopoly/Hn_o3mu9peY_000040_000050.avi; trial 5
Failed to decode video idx 72108 from /data/k400/train/filling_eyebrows/1m50SSGbG2k_000148_000158.avi; trial 99
Failed to decode video idx 170537 from /data/k400/train/scuba_diving/dQQK-KSp_pE_000044_000054.avi; trial 15
Failed to decode video idx 139676 from /data/k400/train/playing_paintball/coNWv_D7Fyk_000135_000145.avi; trial 95
Failed to decode video idx 138602 from /data/k400/train/playing_monopoly/Hn_o3mu9peY_000040_000050.avi; trial 6
Failed to decode video idx 205437 from /data/k400/train/taking_a_shower/U540GFOTF6U_000002_000012.avi; trial 99
Failed to decode video idx 170537 from /data/k400/train/scuba_diving/dQQK-KSp_pE_000044_000054.avi; trial 16
Failed to decode video idx 138602 from /data/k400/train/playing_monopoly/Hn_o3mu9peY_000040_000050.avi; trial 7
Failed to decode video idx 154000 from /data/k400/train/punching_bag/BNwpN8GFixE_000010_000020.avi; trial 0
Failed to decode video idx 139676 from /data/k400/train/playing_paintball/coNWv_D7Fyk_000135_000145.avi; trial 96
Failed to decode video idx 31483 from /data/k400/train/changing_oil/csJFMaPl9Og_000370_000380.avi; trial 4
Failed to decode video idx 170537 from /data/k400/train/scuba_diving/dQQK-KSp_pE_000044_000054.avi; trial 17
Failed to decode video idx 138602 from /data/k400/train/playing_monopoly/Hn_o3mu9peY_000040_000050.avi; trial 8
Failed to decode video idx 31483 from /data/k400/train/changing_oil/csJFMaPl9Og_000370_000380.avi; trial 5
Failed to decode video idx 86337 from /data/k400/train/headbanging/c6JhdcwPHQU_000002_000012.avi; trial 97
Failed to decode video idx 170537 from /data/k400/train/scuba_diving/dQQK-KSp_pE_000044_000054.avi; trial 18
Failed to decode video idx 31483 from /data/k400/train/changing_oil/csJFMaPl9Og_000370_000380.avi; trial 6
Failed to decode video idx 204993 from /data/k400/train/tai_chi/qV7j-jQCH3M_000027_000037.avi; trial 0
Failed to decode video idx 31483 from /data/k400/train/changing_oil/csJFMaPl9Og_000370_000380.avi; trial 7
Failed to decode video idx 86337 from /data/k400/train/headbanging/c6JhdcwPHQU_000002_000012.avi; trial 98
Traceback (most recent call last):
File "tools/run_net.py", line 45, in
main()
File "tools/run_net.py", line 26, in main
launch_job(cfg=cfg, init_method=args.init_method, func=train)
File "/data/home/SlowFast/slowfast/utils/misc.py", line 296, in launch_job
torch.multiprocessing.spawn(
File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 2 terminated with the following error:
Traceback (most recent call last):
File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/data/home/SlowFast/slowfast/utils/multiprocessing.py", line 60, in run
ret = func(cfg)
File "/data/home/SlowFast/tools/train_net.py", line 708, in train
train_epoch(
File "/data/home/SlowFast/tools/train_net.py", line 86, in train_epoch
for cur_iter, (inputs, labels, index, time, meta) in enumerate(
File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/_utils.py", line 434, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/data/home/SlowFast/slowfast/datasets/kinetics.py", line 488, in getitem
raise RuntimeError(
RuntimeError: Failed to fetch video idx 168596 from /data/k400/train/salsa_dancing/EY6MSW3zkr8_000048_000058.avi; after 99 trials

I have checked with the data paths, and there is no problem with the path.

Anyone know the reason?
Thanks!

kkk55596 · 2022-07-12T08:36:51Z

Hi, have you solved this problem already? I also meet this problem.

alpargun · 2022-07-14T11:35:38Z

This is due to torchvision backend during video decoding. Some people mentioned that building torchvision from source solves this issue, however, I haven't been able to fix it yet.
This issue already discusses this problem and a possible solution is to change the video decoding backend to PyAV instead. In the YAML config file, you can add:

DATA:
  DECODING_BACKEND: pyav

to switch to the PyAV backend. However, PyAV backend introduces another error related to changed data types that is due to a recent commit, so this pull request already solves this problem. I did the necessary changes in the given pull request and now I am able to run the framework with the PyAv backend.

haooooooqi · 2022-08-03T06:37:18Z

Thanks for playing with pysf.
You might get the issue fixed if you preprocess the video to the same format?

kkk55596 · 2022-08-03T06:46:14Z

I solved this problem after re-installing torchvision from source.
Thus, I can use the following method.
DATA: DECODING_BACKEND: torchvision

alpargun · 2022-08-05T16:50:15Z

Which torch and torchvision version are you using? Thanks

poincarelee · 2022-09-09T06:55:08Z

The pull request you mentioned did solve the problem.
I met another problem: the top 1 error( also top5 error) seems not to decrease straightly, in certain epoch, top1 error was 37.5% ,while during some later epoch, top1 error became 50%, and the final accuracy(top 1 acc) is 42.14%(top5 acc: 72.81) which is much smaller than reported in paper, just as follows:

I trained X3D on HMDB51 dataset.
Anything wrong with the training code?

alpargun · 2022-09-09T10:13:07Z

I haven't trained on the HMDB51 dataset yet but I am assuming two possibilities:

Please check the paper again to see if they did pre-training on another dataset to obtain the published results
HMDB51 does not have a config file in the SlowFast repo. Hence, the values of parameters in your config can affect the performance because for the other datasets, SlowFast already provides config files that are tuned for the corresponding datasets

poincarelee · 2022-09-15T07:13:08Z

You are right.
Kinetics and AVA datasets are preferred. I referred to other dataset's config file(like Kinetics') and changed it for hmdb51. K400 is still a little larger, training would be much longer. However, I am now working on K400 and choose about 10% for training, which still needs about 3 days.

poincarelee · 2022-09-16T10:39:58Z

@alpargun.
Hi, I have trained on K400 dataset, while the top1-error and top5-error seems weird.

As shown from the picture above, epoch number is 105, top1 error is still 81.25% in certain batch, while in some other batch it's 56% or 43%. Most of batches during one epoch are nearly 50% but there's always some batch being 80% or 70%, top5 error also vibrates but doesn't show such a trend. Have you met this problem before?

Patrick-CH · 2023-04-14T10:12:59Z

Thanks for playing with pysf. You might get the issue fixed if you preprocess the video to the same format?

I have tried. Even if i preprocess the videos to the same format .mp4, the problem still exists.

alpargun · 2023-04-14T10:18:33Z

Hi, you might find the INSTALL.md file in my SlowFast fork useful for updated installation steps. I would suggest PyTorch <= 1.13.1 as I had similar problems with 2.0.

Following the INSTALL.md file, I suggest installing PyTorch together with TorchVision. I recently set up SlowFast on multiple Ubuntu 20.04 machines and a MacBook following this updated INSTALL.md, and had no problems.

ConvAndConv · 2024-09-14T03:13:43Z

i face same question, torch==2.0.0 ,torchvision==0.15.1, i use Kinetics config and slowfast_8*8_r50.yaml, how can i fix it without lowered torch version?Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Failed to fetch video idx 168596 from /data/k400/train/salsa_dancing/EY6MSW3zkr8_000048_000058.avi; after 99 trials #558

RuntimeError: Failed to fetch video idx 168596 from /data/k400/train/salsa_dancing/EY6MSW3zkr8_000048_000058.avi; after 99 trials #558

Christinepan881 commented Jun 16, 2022

kkk55596 commented Jul 12, 2022

alpargun commented Jul 14, 2022 •

edited

Loading

haooooooqi commented Aug 3, 2022

kkk55596 commented Aug 3, 2022

alpargun commented Aug 5, 2022

poincarelee commented Sep 9, 2022

alpargun commented Sep 9, 2022

poincarelee commented Sep 15, 2022

poincarelee commented Sep 16, 2022 •

edited

Loading

Patrick-CH commented Apr 14, 2023

alpargun commented Apr 14, 2023

ConvAndConv commented Sep 14, 2024

RuntimeError: Failed to fetch video idx 168596 from /data/k400/train/salsa_dancing/EY6MSW3zkr8_000048_000058.avi; after 99 trials #558

RuntimeError: Failed to fetch video idx 168596 from /data/k400/train/salsa_dancing/EY6MSW3zkr8_000048_000058.avi; after 99 trials #558

Comments

Christinepan881 commented Jun 16, 2022

kkk55596 commented Jul 12, 2022

alpargun commented Jul 14, 2022 • edited Loading

haooooooqi commented Aug 3, 2022

kkk55596 commented Aug 3, 2022

alpargun commented Aug 5, 2022

poincarelee commented Sep 9, 2022

alpargun commented Sep 9, 2022

poincarelee commented Sep 15, 2022

poincarelee commented Sep 16, 2022 • edited Loading

Patrick-CH commented Apr 14, 2023

alpargun commented Apr 14, 2023

ConvAndConv commented Sep 14, 2024

alpargun commented Jul 14, 2022 •

edited

Loading

poincarelee commented Sep 16, 2022 •

edited

Loading