-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error in projection foward: no kernel image is available for execution on the device #16
Comments
Thank you for making the code available, Xiuming. I've met the same error in trying to train marrnet with shapenet examples. Is there a solution here? Hao, did you ever figure this out? Thanks again, |
@weeoooweeooo would you mind sharing your detailed error message? It seems that I can not reproduce this. I suspect this might be caused by improper install of cuda kernels; I'll update an install script for this. |
@ztzhang Thank you for responding so quickly. ==> Training The issues arose originally in trying to create a workaround due to deprecation of torch.utils.ffi in pytorch 1.0, however. I'm using an RTX gpu which requires it and CUDA 10, but don't understand _wrap_function nor create_extension well enough to rewrite those sections. The original errors follow. The solution isn't a drop in replacement, it seems. Do you have any ideas? ==> Parsing arguments |
@weeoooweeooo I think your problem is different from OP. @royalhao3zZ I just pushed a fix with clean_toolbox_build.sh and build_toolbox.sh. Would you mind trying clean the previous build first and rebuild the toolbox again? Thanks! As for @weeoooweeooo, I think I might have a quick fix for that in hand, please stay tuned. |
@weeoooweeooo I figured out a quick fix to make it compile. However we would like to keep the original repo consistent so I do not plan to push this to the repo. Here's what I did:
Then you could first run setup.sh to build the .so files and run |
@weeoooweeooo would you mind letting us know if this works for you? Thanks! |
@ztzhang Thank you so much for helping with me with this specific workaround. ==> Training Have tried to troubleshoot a bit. Everything appears smooth, except a warning in compiling: /srv/git/GenRe-ShapeHD/toolbox/cam_bp |
Hi, after a more careful read into the doc, it seems the build system now relies on pybind11 to expose the cpp functions calls; I'm guessing this is why the error happens. |
Hi @ztzhang, thank you for your guidance about pybind11 to expose the cpp functions. I have been trying to do so. Here, I have modified their example with your build.py script: from setuptools import setup, Extension version = '0.0.1' class get_pybind_include(object):
extra_compile_args =list() # [' ffi_params = { ext_modules = [ As of Python 3.6, CCompiler has a
|
@weeoooweeooo I might have some time to look at this I'm not sure if it is still cost efficient to hack it tho; I'll try to overhaul some of those kernels to the current c++ api, and I think some of the ops are already included in the pytorch function sets. |
That would be very helpful @ztzhang. I'd appreciate any input you might have. Thanks! |
I am having the same issue as @royalhao3zZ. I ran |
@dannygelman1 would you mind sharing your compile time messages as well as the error messages? |
Yes! Thank you for looking into this!
|
Would you mind cleaning the build and recompile the cuda kernels? And please post the corresponding print so that I can help tracking this down. Thanks. |
After I run
Since it is saying |
Yes. Can you post the compile messages as well?
…On Tue, Oct 15, 2019 at 12:18 PM Danny Gelman ***@***.***> wrote:
After I run ./clean_toolbox_build.sh I get the following
Directory calc_prob/__pycache__ removed
Directory calc_prob/_ext removed
Directory calc_prob/functions/__pycache__ removed
File cam_bp/src/back_projection_kernel.cu.o removed
__pycache__ not found
dist not found
build not found
pytorch_camera_back_projection.egg-info not found
.cache not found
Directory cam_bp/__pycache__ removed
Directory cam_bp/_ext removed
Directory cam_bp/functions/__pycache__ removed
Directory cam_bp/modules/__pycache__ removed
Since it is saying build not found, among other files, does that mean I
am not creating all the necessary files?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16?email_source=notifications&email_token=ADF4WO6BKBSDFSBNAK2WIMLQOXUOLA5CNFSM4HIT2QL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJL2TA#issuecomment-542293324>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADF4WO35UGCKTRQTLQKRCNTQOXUOLANCNFSM4HIT2QLQ>
.
|
Here are all the messages after I run
|
Hello, Thank you for looking into my issue! I just wanted to follow up on this and make sure I provided the messages you wanted to see. Are these the compile messages you wanted? Also, I am an MIT undergraduate and trying to use this repo as part of my project in the Media Lab. I pass by CSAIL often and was wondering, if you are free, maybe we can meet in person to discuss the issue I am running into? Thank you! |
Sorry for the late reply, happy to chat! I can help with the issue if you can show me your setup as well! |
No worries! My supervisor @gbernal and I would be happy to chat with you! You are welcome to come by Fluid Interfaces in the Media Lab so we can show you our setup, or we can come by CSAIL if that's easier for you. What days/times are good for you? |
Was there ever a resolution on this? I'm getting the same errors. |
@weeoooweeooo I am getting the same errors. Did you get any solution to that? |
@colinqian Did not manage to get beyond these errors, despite attempts with suggested workarounds. The deprecations in pytorch 1.0 require some non-trivial changes in the code here it seems. |
I can get GenRe running on machines with CUDA 9.2 and pytorch 0.4.1. The key pieces are making sure I add the gpu arch specification to the setup.sh scripts in toolbox/, and setting these environment variables (modify as necessary for your machine): export CPATH=$CPATH:/usr/local/cuda-9.2/include Installing pytorch 0.4.1 is itself non trivial anymore; besides the correct cuda version it requires specific gcc version, but I found installing using conda once I had these to be not too bad. |
@wagnew3 It works now. I get it running with CUDA 9.0 and pytorch 0.4.1. I upgraded gcc to the lastest version and add some environment variables. Thank you. |
@colinqian Which version of GCC did you happen to update it to? I'm getting the same error, running with CUDA 9.0 and pytorch 0.4.1 as well. |
I've been stopped by this issue for several days.
while running test_genre.sh,I got the following error:
Traceback (most recent call last):
File "test.py", line 95, in
model.test_on_batch(i, batch)
File "/home/zhanghao/models/genre_full_model.py", line 182, in test_on_batch
pred = self.forward_with_trimesh(batch)
File "/home/zhanghao/models/genre_full_model.py", line 207, in forward_with_trimesh
proj = self.net.depth_and_inpaint.proj_depth(pred_abs_depth)
File "/media/zhanghao/娱乐/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/zhanghao/toolbox/cam_bp/cam_bp/modules/camera_backprojection_module.py", line 22, in forward
df = CameraBackProjection.apply(depth_t, fl, cam_dist, self.res)
File "/home/zhanghao/toolbox/cam_bp/cam_bp/functions/cam_back_projection.py", line 25, in forward
cam_bp_lib.back_projection_forward(depth_t, cam_dist, fl, tdf, cnt)
File "/media/zhanghao/娱乐/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/init.py", line 202, in safe_call
result = torch._C._safe_call(*args, **kwargs)
torch.FatalError: aborting at /data/vision/billf/scratch/ztzhang/shape_oneshot/ShapeRecon/toolbox/cam_bp/cam_bp/src/back_projection.c:14
Does anyone have solution for that? thanks.
The text was updated successfully, but these errors were encountered: