Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in projection foward: no kernel image is available for execution on the device #16

Open
royalhao3zZ opened this issue Apr 26, 2019 · 27 comments
Assignees
Labels
enhancement New feature or request

Comments

@royalhao3zZ
Copy link

I've been stopped by this issue for several days.
while running test_genre.sh,I got the following error:
Traceback (most recent call last):
File "test.py", line 95, in
model.test_on_batch(i, batch)
File "/home/zhanghao/models/genre_full_model.py", line 182, in test_on_batch
pred = self.forward_with_trimesh(batch)
File "/home/zhanghao/models/genre_full_model.py", line 207, in forward_with_trimesh
proj = self.net.depth_and_inpaint.proj_depth(pred_abs_depth)
File "/media/zhanghao/娱乐/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/zhanghao/toolbox/cam_bp/cam_bp/modules/camera_backprojection_module.py", line 22, in forward
df = CameraBackProjection.apply(depth_t, fl, cam_dist, self.res)
File "/home/zhanghao/toolbox/cam_bp/cam_bp/functions/cam_back_projection.py", line 25, in forward
cam_bp_lib.back_projection_forward(depth_t, cam_dist, fl, tdf, cnt)
File "/media/zhanghao/娱乐/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/init.py", line 202, in safe_call
result = torch._C._safe_call(*args, **kwargs)
torch.FatalError: aborting at /data/vision/billf/scratch/ztzhang/shape_oneshot/ShapeRecon/toolbox/cam_bp/cam_bp/src/back_projection.c:14

Does anyone have solution for that? thanks.

@weeoooweeooo
Copy link

Thank you for making the code available, Xiuming.

I've met the same error in trying to train marrnet with shapenet examples. Is there a solution here?

Hao, did you ever figure this out?

Thanks again,
Jeff

@ztzhang
Copy link
Collaborator

ztzhang commented Jun 26, 2019

@weeoooweeooo would you mind sharing your detailed error message? It seems that I can not reproduce this. I suspect this might be caused by improper install of cuda kernels; I'll update an install script for this.

@weeoooweeooo
Copy link

@ztzhang Thank you for responding so quickly.
I'm in the process of trying to install new kernels exactly.

==> Training
Epoch 1/1000
10000/10000 [==============================] - 188s - loss: 1549.6328 - depth: 614.3428 - silhou: 483.5301 - normal: 451.7600 - depth_minmax: 2138.4353
Eval 1/1000
error in projection foward: no kernel image is available for execution on the device
Traceback (most recent call last):
File "train.py", line 216, in
eval_at_start=opt.eval_at_start
File "/srv/git/GenRe-ShapeHD/models/netinterface.py", line 287, in train_epoch
_eval(epoch)
File "/srv/git/GenRe-ShapeHD/models/netinterface.py", line 270, in _eval
batch_log = self._vali_on_batch(epoch, i, data)
File "/srv/git/GenRe-ShapeHD/models/marrnet1.py", line 69, in _vali_on_batch
output = self.pack_output(pred, batch)
File "/srv/git/GenRe-ShapeHD/models/marrnet1.py", line 94, in pack_output
out['proj_depth'] = self.proj_depth(pred_abs_depth).cpu().numpy()
File "/srv/git/GenRe-ShapeHD/models/marrnetbase.py", line 154, in proj_depth
proj_depth = self.cam_bp(abs_depth)
File "/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/srv/git/GenRe-ShapeHD/models/marrnetbase.py", line 174, in forward
return CameraBackProjection.apply(depth_t, fl, cam_dist, self.res)
File "/srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/functions/cam_back_projection.py", line 25, in forward
cam_bp_lib.back_projection_forward(depth_t, cam_dist, fl, tdf, cnt)
File "/srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/_ext/cam_bp_lib/init.py", line 175, in safe_call
result = torch._C._safe_call(*args, **kwargs)
torch.FatalError: aborting at /data/vision/billf/scratch/ztzhang/shape_oneshot/ShapeRecon/toolbox/cam_bp/cam_bp/src/back_projection.c:14

The issues arose originally in trying to create a workaround due to deprecation of torch.utils.ffi in pytorch 1.0, however. I'm using an RTX gpu which requires it and CUDA 10, but don't understand _wrap_function nor create_extension well enough to rewrite those sections. The original errors follow. The solution isn't a drop in replacement, it seems. Do you have any ideas?

==> Parsing arguments
Traceback (most recent call last):
File "train.py", line 18, in
opt, unique_opt_params = options_train.parse()
File "/srv/git/GenRe-ShapeHD/options/options_train.py", line 118, in parse
parser, unique_params_model = get_model(net_name).add_arguments(parser)
File "/srv/git/GenRe-ShapeHD/models/init.py", line 5, in get_model
module = importlib.import_module('models.' + alias)
File "/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 665, in _load_unlocked
File "", line 678, in exec_module
File "", line 219, in _call_with_frames_removed
File "/srv/git/GenRe-ShapeHD/models/marrnet1.py", line 8, in
from .marrnetbase import MarrnetBaseModel
File "/srv/git/GenRe-ShapeHD/models/marrnetbase.py", line 7, in
from toolbox.cam_bp.cam_bp.functions import CameraBackProjection
File "/srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/functions/init.py", line 1, in
from .cam_back_projection import CameraBackProjection
File "/srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/functions/cam_back_projection.py", line 4, in
from .._ext import cam_bp_lib
File "/srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/_ext/cam_bp_lib/init.py", line 1, in
from torch.utils.ffi import _wrap_function
File "/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/init.py", line 1, in
raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.")
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

@ztzhang
Copy link
Collaborator

ztzhang commented Jun 26, 2019

@weeoooweeooo I think your problem is different from OP. @royalhao3zZ I just pushed a fix with clean_toolbox_build.sh and build_toolbox.sh. Would you mind trying clean the previous build first and rebuild the toolbox again? Thanks!

As for @weeoooweeooo, I think I might have a quick fix for that in hand, please stay tuned.

@ztzhang
Copy link
Collaborator

ztzhang commented Jun 26, 2019

@weeoooweeooo I figured out a quick fix to make it compile. However we would like to keep the original repo consistent so I do not plan to push this to the repo.

Here's what I did:

  1. copy all .c files as .cpp files.
  2. for each setup.sh, comment out line 34-42.
  3. modify the build.py as follows (only showing for calc_prob) :
import os
import sys
import torch
from torch.utils.cpp_extension import CppExtension, BuildExtension, include_paths

this_file = os.path.dirname(os.path.realpath(__file__))
print(this_file)

extra_compile_args = list()


extra_objects = list()
assert(torch.cuda.is_available())
sources = ['calc_prob/src/calc_prob.cpp']
headers = ['calc_prob/src/calc_prob.h']
defines = [('WITH_CUDA', True)]
with_cuda = True

extra_objects = ['calc_prob/src/calc_prob_kernel.cu.o']
extra_objects = [os.path.join(this_file, fname) for fname in extra_objects]

ffi_params = {
    #'headers': headers,
    'sources': sources,
    'define_macros': defines,
    #'relative_to': __file__,
    #'with_cuda': with_cuda,
    'extra_objects': extra_objects,
    'include_dirs': [os.path.join(this_file, 'calc_prob/src')] + include_paths(True),
    'extra_compile_args': extra_compile_args,
}


if __name__ == '__main__':
    ext = CppExtension(
        'calc_prob._ext.calc_prob_lib',
        # package=False,
        **ffi_params)

    from setuptools import setup
    setup(name='calc_prob', ext_modules=[ext], cmdclass={'build_ext': BuildExtension})

Then you could first run setup.sh to build the .so files and run python build.py build_ext to build the extensions you need. Then you might need to copy or soft link the built _ext in the build folder to(there might be parent folders with your os and python spec), to calc_prob/calc_prob/_ext

@ztzhang
Copy link
Collaborator

ztzhang commented Jun 26, 2019

@weeoooweeooo would you mind letting us know if this works for you? Thanks!

@weeoooweeooo
Copy link

@ztzhang Thank you so much for helping with me with this specific workaround.
I have tried your suggestions, but am now being met with this error calling functions from the newly built extension:

==> Training
Epoch 1/1000
10000/10000 [==============================] - 191s - loss: 1574.3203 - depth: 618.3826 - silhou: 500.1726 - normal: 455.7651 - depth_minmax: 1982.9009
Eval 1/1000
Traceback (most recent call last):
File "train.py", line 213, in
eval_at_start=opt.eval_at_start
File "/srv/git/GenRe-ShapeHD/models/netinterface.py", line 285, in train_epoch
_eval(epoch)
File "/srv/git/GenRe-ShapeHD/models/netinterface.py", line 268, in _eval
batch_log = self._vali_on_batch(epoch, i, data)
File "/srv/git/GenRe-ShapeHD/models/marrnet1.py", line 69, in _vali_on_batch
output = self.pack_output(pred, batch)
File "/srv/git/GenRe-ShapeHD/models/marrnet1.py", line 94, in pack_output
out['proj_depth'] = self.proj_depth(pred_abs_depth).cpu().numpy()
File "/srv/git/GenRe-ShapeHD/models/marrnetbase.py", line 154, in proj_depth
proj_depth = self.cam_bp(abs_depth)
File "/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/srv/git/GenRe-ShapeHD/models/marrnetbase.py", line 174, in forward
return CameraBackProjection.apply(depth_t, fl, cam_dist, self.res)
File "/srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/functions/cam_back_projection.py", line 25, in forward
cam_bp_lib.back_projection_forward(depth_t, cam_dist, fl, tdf, cnt)
AttributeError: module 'toolbox.cam_bp.cam_bp._ext.cam_bp_lib' has no attribute 'back_projection_forward'

Have tried to troubleshoot a bit. Everything appears smooth, except a warning in compiling:

/srv/git/GenRe-ShapeHD/toolbox/cam_bp
running build_ext
building 'cam_bp._ext.cam_bp_lib' extension
creating build
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/cam_bp
creating build/temp.linux-x86_64-3.6/cam_bp/src
gcc -pthread -B /home/gsq/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA=True -I/srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include/TH -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include/TH -I/home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/include/THC -I/home/gsq/anaconda3/envs/shaperecon/include/python3.6m -c cam_bp/src/back_projection.cpp -o build/temp.linux-x86_64-3.6/cam_bp/src/back_projection.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=cam_bp_lib -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/cam_bp
creating build/lib.linux-x86_64-3.6/cam_bp/_ext
g++ -pthread -shared -B /home/gsq/anaconda3/envs/shaperecon/compiler_compat -L/home/gsq/anaconda3/envs/shaperecon/lib -Wl,-rpath=/home/gsq/anaconda3/envs/shaperecon/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/cam_bp/src/back_projection.o /srv/git/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src/back_projection_kernel.cu.o -o build/lib.linux-x86_64-3.6/cam_bp/_ext/cam_bp_lib.cpython-36m-x86_64-linux-gnu.so

@ztzhang
Copy link
Collaborator

ztzhang commented Jun 27, 2019

Hi, after a more careful read into the doc, it seems the build system now relies on pybind11 to expose the cpp functions calls; I'm guessing this is why the error happens.
I don't think we need to rewrite everything since the C API is still maintained, but to add pybind to the cpp functions. Sorry I may not have much capacity to fix this issue particularly, but I would suggest adding pybind to the cpp files and see if it works.

@weeoooweeooo
Copy link

Hi @ztzhang, thank you for your guidance about pybind11 to expose the cpp functions. I have been trying to do so. Here, I have modified their example with your build.py script:

from setuptools import setup, Extension
from setuptools.command.build_ext import build_ext
import sys
import setuptools
import os
import torch
from torch.utils.cpp_extension import CppExtension, BuildExtension, include_paths

version = '0.0.1'
this_file = os.path.dirname(os.path.realpath(file))
print(this_file)

class get_pybind_include(object):
"""Helper class to determine the pybind11 include path
The purpose of this class is to postpone importing pybind11
until it is actually installed, so that the get_include()
method can be invoked. """

def __init__(self, user=False):
    self.user = user

def __str__(self):
    import pybind11
    return pybind11.get_include(self.user)

extra_compile_args =list() # ['python3 -m pybind11 --includes']
extra_objects = list()
assert(torch.cuda.is_available())
sources = ['cam_bp/src/back_projection.cpp']
headers = ['cam_bp/src/back_projection.h']
defines = [('WITH_CUDA', True)]
with_cuda = True
extra_objects = ['cam_bp/src/back_projection_kernel.cu.o']
extra_objects = [os.path.join(this_file, fname) for fname in extra_objects]

ffi_params = {
# 'headers': headers,
# 'sources': sources,
'define_macros': defines,
# 'relative_to': file,
# 'with_cuda': with_cuda,
'extra_objects': extra_objects,
'extra_compile_args': extra_compile_args,
}

ext_modules = [
CppExtension(
'cam_bp_lib',
['cam_bp/src/back_projection.cpp'],
include_dirs=[
os.path.join(this_file, 'cam_bp/src'),
# Path to pybind11 headers
get_pybind_include(),
get_pybind_include(user=True),
'/usr/local/cuda-10.0/targets/x86_64-linux/include'],
language='c++',
**ffi_params
),
]

As of Python 3.6, CCompiler has a has_flag method.

cf http://bugs.python.org/issue26689

def has_flag(compiler, flagname):
"""Return a boolean indicating whether a flag name is supported on
the specified compiler.
"""
import tempfile
with tempfile.NamedTemporaryFile('w', suffix='.cpp') as f:
f.write('int main (int argc, char **argv) { return 0; }')
try:
compiler.compile([f.name], extra_postargs=[flagname])
except setuptools.distutils.errors.CompileError:
return False
return True

def cpp_flag(compiler):
"""Return the -std=c++[11/14/17] compiler flag
The newer version is prefered over c++11 (when it is available).
"""
flags = ['-std=c++17', '-std=c++14', '-std=c++11']

for flag in flags:
    if has_flag(compiler, flag): return flag

raise RuntimeError('Unsupported compiler -- at least C++11 support '
                   'is needed!')

class BuildExt(build_ext):
"""A custom build extension for adding compiler-specific options."""
c_opts = {
'msvc': ['/EHsc'],
'unix': [],
}
l_opts = {
'msvc': [],
'unix': [],
}

if sys.platform == 'darwin':
    darwin_opts = ['-stdlib=libc++', '-mmacosx-version-min=10.7']
    c_opts['unix'] += darwin_opts
    l_opts['unix'] += darwin_opts

def build_extensions(self):
    ct = self.compiler.compiler_type
    opts = self.c_opts.get(ct, [])
    link_opts = self.l_opts.get(ct, [])
    if ct == 'unix':
        opts.append('-DVERSION_INFO="%s"' % self.distribution.get_version())
        opts.append(cpp_flag(self.compiler))
        if has_flag(self.compiler, '-fvisibility=hidden'):
            opts.append('-fvisibility=hidden')
    elif ct == 'msvc':
        opts.append('/DVERSION_INFO=\\"%s\\"' % self.distribution.get_version())
    for ext in self.extensions:
        ext.extra_compile_args = opts
        ext.extra_link_args = link_opts
    build_ext.build_extensions(self)

setup(
name='cam_bp_lib',
version=version,
ext_modules=ext_modules,
install_requires=['pybind11>=2.3'],
setup_requires=['pybind11>=2.3'],
cmdclass={'build_ext': BuildExtension},
zip_safe=False,
)

Though I am able to expose simpler functions with this setup, I'm unable to get it working for your toolboxes so far unfortunately. Currently, I'm getting this error in trying to import the compiled toolbox:

import cam_bp_lib
Traceback (most recent call last):
File "", line 1, in
ImportError: /home/gsq/anaconda3/envs/shaperecon/lib/python3.6/site-packages/cam_bp_lib.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZTIN3c1010TensorImplE

I suspect the main issue has to do with sharing the CUDA library .so to compile with the .cpp file. Do you have any insight about this maybe?
Should I try compiling the CUDA code with cuda_extension?
Or maybe share the library in this manner https://devtalk.nvidia.com/default/topic/759162/shared-library-separate-compilation-c-c-/ ?

@ztzhang
Copy link
Collaborator

ztzhang commented Jul 5, 2019

@weeoooweeooo I might have some time to look at this during the weekend next week, my guess is that we also need to add extern_c for the wrapper functions.

I'm not sure if it is still cost efficient to hack it tho; I'll try to overhaul some of those kernels to the current c++ api, and I think some of the ops are already included in the pytorch function sets.

@ztzhang ztzhang self-assigned this Jul 5, 2019
@ztzhang ztzhang added the enhancement New feature or request label Jul 5, 2019
@weeoooweeooo
Copy link

That would be very helpful @ztzhang. I'd appreciate any input you might have. Thanks!
I'm afraid I've not had much exposure to c++/cuda, but am very interested in trying to use your model with some medical images. Please let me know what I can do.

@dannygelman1
Copy link

dannygelman1 commented Oct 14, 2019

@weeoooweeooo I think your problem is different from OP. @royalhao3zZ I just pushed a fix with clean_toolbox_build.sh and build_toolbox.sh. Would you mind trying clean the previous build first and rebuild the toolbox again? Thanks!

As for @weeoooweeooo, I think I might have a quick fix for that in hand, please stay tuned.

I am having the same issue as @royalhao3zZ. I ran ./clean_toolbox_build.sh and then ./build_toolbox.sh again, but I'm still getting the same issue when trying to run scripts/test_genre.sh. If you could provide any insight into this error, or any potential fixes, I would really appreciate it! Thank you!

@ztzhang
Copy link
Collaborator

ztzhang commented Oct 14, 2019

@dannygelman1 would you mind sharing your compile time messages as well as the error messages?

@dannygelman1
Copy link

Yes! Thank you for looking into this!
This is everything that prints after I run scripts/test_genre.sh 0
(The zero is to indicate the index of the gpu I want to use. Since my machine only has one gpu, it is at index 0)

==> Parsing arguments
Namespace(adam_beta1=0.5, adam_beta2=0.9, batch_size=1, classes='chair', dataset=None, epoch=0, epoch_batches=None, eval_at_start=False, eval_batches=None, expr_id=0, full_logdir=None, gpu='0', inpaint_path=None, input_mask='./downloads/data/test/genre/*_silhouette.*', input_rgb='./downloads/data/test/genre/*_rgb.*', joint_train=False, load_offline=False, log_batch=False, log_time=False, logdir=None, lr=0.0001, manual_seed=None, net='genre_full_model', net1_path=None, net_file='./downloads/models/full_model.pt', optim='adam', output_dir='./output/test', overwrite=True, padding_margin=16, pred_depth_minmax=True, resume=0, save_net=1, save_net_opt=False, sgd_dampening=0, sgd_momentum=0.9, suffix='{net}', surface_weight=1.0, tensorboard=False, vis_batches_train=10, vis_batches_vali=10, vis_every_train=1, vis_every_vali=1, vis_param_f=None, vis_workers=4, wdecay=0.0, workers=0)
==> Setting device
[Warning] Designated GPU in use: id=0, util=11%, memory in use: 450 MiB
==> Setting up output directory
==> Setting up loggers
==> Setting up models
[Warning] Model loaded without optimizer states. 
Testing GenRe
# model parameters: 100,204,619
==> Setting up data loaders
[Verbose] Time spent in data IO initialization: 0.00s
[Verbose] # test points: 4
[Verbose] # test batches: 4
==> Testing
  0%|                                                                                                                                                                                                                | 0/4 [00:00<?, ?it/s]error in projection foward: no kernel image is available for execution on the device

Traceback (most recent call last):
  File "test.py", line 94, in <module>
    model.test_on_batch(i, batch)
  File "/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/models/genre_full_model.py", line 182, in test_on_batch
    pred = self.forward_with_trimesh(batch)
  File "/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/models/genre_full_model.py", line 207, in forward_with_trimesh
    proj = self.net.depth_and_inpaint.proj_depth(pred_abs_depth)
  File "/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/modules/camera_backprojection_module.py", line 22, in forward
    df = CameraBackProjection.apply(depth_t, fl, cam_dist, self.res)
  File "/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/functions/cam_back_projection.py", line 25, in forward
    cam_bp_lib.back_projection_forward(depth_t, cam_dist, fl, tdf, cnt)
  File "/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/__init__.py", line 202, in safe_call
    result = torch._C._safe_call(*args, **kwargs)
torch.FatalError: aborting at /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src/back_projection.c:14```

@ztzhang
Copy link
Collaborator

ztzhang commented Oct 15, 2019

Would you mind cleaning the build and recompile the cuda kernels? And please post the corresponding print so that I can help tracking this down. Thanks.

@dannygelman1
Copy link

After I run ./clean_toolbox_build.sh I get the following

Directory calc_prob/__pycache__ removed
Directory calc_prob/_ext removed
Directory calc_prob/functions/__pycache__ removed
File cam_bp/src/back_projection_kernel.cu.o removed
__pycache__ not found
dist not found
build not found
pytorch_camera_back_projection.egg-info not found
.cache not found
Directory cam_bp/__pycache__ removed
Directory cam_bp/_ext removed
Directory cam_bp/functions/__pycache__ removed
Directory cam_bp/modules/__pycache__ removed

Since it is saying build not found, among other files, does that mean I am not creating all the necessary files?

@ztzhang
Copy link
Collaborator

ztzhang commented Oct 15, 2019 via email

@dannygelman1
Copy link

Here are all the messages after I run ./build_toolbox.sh

Add -gencode to match all the GPU architectures you have.
Check 'https://en.wikipedia.org/wiki/CUDA#GPUs_supported' for list of architecture.
Check 'http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html' for GPU compilation based on architecture.
/home/guillermo/anaconda3/envs/shaperecon/bin/python
setup.sh: line 9: /home/guillermo/anaconda3/envs/shaperecon/bin:/home/guillermo/anaconda3/condabin:/home/guillermo/.local/bin:/home/guillermo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda-9.2/bin: No such file or directory
nvcc -c -o calc_prob_kernel.cu.o calc_prob_kernel.cu -x cu -Xcompiler -fPIC -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include/TH -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include/THC -I /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src         -gencode arch=compute_30,code=sm_30         -gencode arch=compute_35,code=sm_35         -gencode arch=compute_52,code=sm_52         -gencode arch=compute_61,code=sm_61
/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob
generating /tmp/tmpx30pui_m/_calc_prob_lib.c
setting the current directory to '/tmp/tmpx30pui_m'
running build_ext
building '_calc_prob_lib' extension
creating home
creating home/guillermo
creating home/guillermo/PycharmProjects
creating home/guillermo/PycharmProjects/Fluid_Research
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src
gcc -pthread -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA=True -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src -I/home/guillermo/anaconda3/envs/shaperecon/include/python3.6m -c _calc_prob_lib.c -o ./_calc_prob_lib.o -std=c99
gcc -pthread -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA=True -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src -I/home/guillermo/anaconda3/envs/shaperecon/include/python3.6m -c /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src/calc_prob.c -o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src/calc_prob.o -std=c99
gcc -pthread -shared -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -L/home/guillermo/anaconda3/envs/shaperecon/lib -Wl,-rpath=/home/guillermo/anaconda3/envs/shaperecon/lib -Wl,--no-as-needed -Wl,--sysroot=/ ./_calc_prob_lib.o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src/calc_prob.o /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/calc_prob/calc_prob/src/calc_prob_kernel.cu.o -o ./_calc_prob_lib.so
Add -gencode to match all the GPU architectures you have.
Check 'https://en.wikipedia.org/wiki/CUDA#GPUs_supported' for list of architecture.
Check 'http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html' for GPU compilation based on architecture.
/home/guillermo/anaconda3/envs/shaperecon/bin/python
setup.sh: line 17: /home/guillermo/anaconda3/envs/shaperecon/bin:/home/guillermo/anaconda3/condabin:/home/guillermo/.local/bin:/home/guillermo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda-9.2/bin: No such file or directory
nvcc -c -o nnd_cuda.cu.o nnd_cuda.cu -x cu -Xcompiler -fPIC -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include/TH -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include/THC -I /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include        -gencode arch=compute_30,code=sm_30         -gencode arch=compute_35,code=sm_35         -gencode arch=compute_52,code=sm_52         -gencode arch=compute_61,code=sm_61
Including CUDA code.
/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance
generating /tmp/tmp4vs_ng9i/_my_lib.c
setting the current directory to '/tmp/tmp4vs_ng9i'
running build_ext
building '_my_lib' extension
creating home
creating home/guillermo
creating home/guillermo/PycharmProjects
creating home/guillermo/PycharmProjects/Fluid_Research
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src
gcc -pthread -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/guillermo/anaconda3/envs/shaperecon/include/python3.6m -c _my_lib.c -o ./_my_lib.o -std=c99
gcc -pthread -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/guillermo/anaconda3/envs/shaperecon/include/python3.6m -c /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src/my_lib.c -o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src/my_lib.o -std=c99
gcc -pthread -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/guillermo/anaconda3/envs/shaperecon/include/python3.6m -c /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src/my_lib_cuda.c -o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src/my_lib_cuda.o -std=c99
gcc -pthread -shared -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -L/home/guillermo/anaconda3/envs/shaperecon/lib -Wl,-rpath=/home/guillermo/anaconda3/envs/shaperecon/lib -Wl,--no-as-needed -Wl,--sysroot=/ ./_my_lib.o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src/my_lib.o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src/my_lib_cuda.o /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/nndistance/src/nnd_cuda.cu.o -o ./_my_lib.so
Add -gencode to match all the GPU architectures you have.
Check 'https://en.wikipedia.org/wiki/CUDA#GPUs_supported' for list of architecture.
Check 'http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html' for GPU compilation based on architecture.
/home/guillermo/anaconda3/envs/shaperecon/bin/python
setup.sh: line 9: /home/guillermo/anaconda3/envs/shaperecon/bin:/home/guillermo/anaconda3/condabin:/home/guillermo/.local/bin:/home/guillermo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda-9.2/bin: No such file or directory
nvcc -c -o back_projection_kernel.cu.o back_projection_kernel.cu -x cu -Xcompiler -fPIC -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include/TH -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include/THC -I /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src -I /home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/lib/include        -gencode arch=compute_30,code=sm_30         -gencode arch=compute_35,code=sm_35         -gencode arch=compute_52,code=sm_52         -gencode arch=compute_61,code=sm_61
/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp
/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp
generating /tmp/tmpymjotthd/_cam_bp_lib.c
setting the current directory to '/tmp/tmpymjotthd'
running build_ext
building '_cam_bp_lib' extension
creating home
creating home/guillermo
creating home/guillermo/PycharmProjects
creating home/guillermo/PycharmProjects/Fluid_Research
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp
creating home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src
gcc -pthread -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA=True -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src -I/home/guillermo/anaconda3/envs/shaperecon/include/python3.6m -c _cam_bp_lib.c -o ./_cam_bp_lib.o -std=c99
gcc -pthread -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA=True -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/guillermo/anaconda3/envs/shaperecon/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src -I/home/guillermo/anaconda3/envs/shaperecon/include/python3.6m -c /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src/back_projection.c -o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src/back_projection.o -std=c99
gcc -pthread -shared -B /home/guillermo/anaconda3/envs/shaperecon/compiler_compat -L/home/guillermo/anaconda3/envs/shaperecon/lib -Wl,-rpath=/home/guillermo/anaconda3/envs/shaperecon/lib -Wl,--no-as-needed -Wl,--sysroot=/ ./_cam_bp_lib.o ./home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src/back_projection.o /home/guillermo/PycharmProjects/Fluid_Research/GenRe-ShapeHD/toolbox/cam_bp/cam_bp/src/back_projection_kernel.cu.o -o ./_cam_bp_lib.so

@dannygelman1
Copy link

Hello,

Thank you for looking into my issue! I just wanted to follow up on this and make sure I provided the messages you wanted to see. Are these the compile messages you wanted?

Also, I am an MIT undergraduate and trying to use this repo as part of my project in the Media Lab. I pass by CSAIL often and was wondering, if you are free, maybe we can meet in person to discuss the issue I am running into?

Thank you!

@ztzhang
Copy link
Collaborator

ztzhang commented Nov 19, 2019

Sorry for the late reply, happy to chat! I can help with the issue if you can show me your setup as well!

@dannygelman1
Copy link

No worries! My supervisor @gbernal and I would be happy to chat with you! You are welcome to come by Fluid Interfaces in the Media Lab so we can show you our setup, or we can come by CSAIL if that's easier for you. What days/times are good for you?

@wagnew3
Copy link

wagnew3 commented Jan 15, 2020

Was there ever a resolution on this? I'm getting the same errors.

@colinqian
Copy link

@weeoooweeooo I am getting the same errors. Did you get any solution to that?

@weeoooweeooo
Copy link

@colinqian Did not manage to get beyond these errors, despite attempts with suggested workarounds. The deprecations in pytorch 1.0 require some non-trivial changes in the code here it seems.

@wagnew3
Copy link

wagnew3 commented May 11, 2020

I can get GenRe running on machines with CUDA 9.2 and pytorch 0.4.1. The key pieces are making sure I add the gpu arch specification to the setup.sh scripts in toolbox/, and setting these environment variables (modify as necessary for your machine):

export CPATH=$CPATH:/usr/local/cuda-9.2/include
export PATH=/usr/local/cuda-9.2/bin${PATH:+:${PATH}}$
export LD_LIBRARY_PATH=/usr/local/cuda-9.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Installing pytorch 0.4.1 is itself non trivial anymore; besides the correct cuda version it requires specific gcc version, but I found installing using conda once I had these to be not too bad.

@colinqian
Copy link

@wagnew3 It works now. I get it running with CUDA 9.0 and pytorch 0.4.1. I upgraded gcc to the lastest version and add some environment variables. Thank you.

@hanseungwook
Copy link

@colinqian Which version of GCC did you happen to update it to? I'm getting the same error, running with CUDA 9.0 and pytorch 0.4.1 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants