Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch Errors #16

Open
FireElementalNE opened this issue Dec 8, 2021 · 4 comments
Open

Torch Errors #16

FireElementalNE opened this issue Dec 8, 2021 · 4 comments

Comments

@FireElementalNE
Copy link

Hello!

I am trying to get this to work and am getting some weird torch errors. I am newish to ML so was a bit confused.

To get it running I had to make some changes to nerface_code/nerf-pytorch/nerf/train_utils.py hopefully I did not break
something 😅

ray_directions_ablation is used here but when run_one_iter_of_nerf is called here it is not passed. The YML file in
the README has options.dataset.no_ndc as True so it fails. I also commented out some other lines that seemed to
be used for ablation runs:

  • Following the comment here I commented out the paragraph here
  • commented out a line here
  • changed ray_dirs_fake to None here

I am guessing that these were all for ablation studies?

The final error I am getting is this (I included the stdout from the program, and obfuscated the directory structure in the errors):

before signal registration
after registration
starting data loading
Done with data loading
done loading data
loading GT background to condition on
bg shape torch.Size([256, 256, 3])
should be  torch.Size([256, 256, 3])
initialized latent codes with shape 551 X 32
computing boundix boxes probability maps
Starting loop
  0%|          | 0/1000000 [00:00<?, ?it/s]$HOME/miniconda3/envs/new_nerf/lib/python3.7/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/conda/conda-bld/pytorch_1634272092750/work/aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0%|          | 0/1000000 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "$REPODIR/4D-Facial-Avatars/nerface_code/nerf-pytorch/train_transformed_rays.py", line 593, in <module>
    main()
  File "$REPODIR/4D-Facial-Avatars/nerface_code/nerf-pytorch/train_transformed_rays.py", line 398, in main
    loss_total.backward()
  File "$HOME/miniconda3/envs/new_nerf/lib/python3.7/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "$HOME/miniconda3/envs/new_nerf/lib/python3.7/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2048, 128]], which is output 0 of ReluBackward0, is at version 2; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I am hoping it is just a versioning issue, but I am not sure.

This is very cool work and I would love to get it working! I would also echo and ask if there is a pretrained model floating around some where that I (and others) could take a look at!

Thanks!!

@qiliux
Copy link

qiliux commented Dec 23, 2021

I also face this error...

I comment this line, then it worked. However, I am still waiting the training result.

sigma_a[:,-1] += 1e-6 # todo commented this for FCB demo !!!!!!

@HyunsooCha
Copy link

I also got the same error. When I comment out sigma_a[:,-1] += 1e-6, there are no differences.

@gafniguy
Copy link
Owner

gafniguy commented Feb 4, 2022

Sorry for leaving all the last minute ablation mess there. You were right to remove/comment out/None anything related to ablation.

seriousran added a commit to seriousran/4D-Facial-Avatars that referenced this issue Sep 2, 2022
To solve gafniguy#16 gafniguy#24 gafniguy#44 issues
if ray_directions_ablation is None, skip following steps about ablation
@wuzuyin
Copy link

wuzuyin commented Sep 30, 2023

I also face this error...

I comment this line, then it worked. However, I am still waiting the training result.

sigma_a[:,-1] += 1e-6 # todo commented this for FCB demo !!!!!!

May I ask how long it took you to run this code? I don't know why it shows that I need to run for over 200 hours here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants