Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error at /home/chris/anti/cuda/render_to_screen.cpp:113 code=999(cudaErrorUnknown) #22

Open
windingwind opened this issue Mar 5, 2022 · 9 comments

Comments

@windingwind
Copy link

windingwind commented Mar 5, 2022

Hi! I met this CUDA error while running render_to_screen.sh:
CUDA error at /home/chris/anti/cuda/render_to_screen.cpp:113 code=999(cudaErrorUnknown) "cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, pbo, cudaGraphicsMapFlagsWriteDiscard)" render_to_screen.sh: line 3: 28062 Segmentation fault (core dumped) python run_nerf.py cfgs/paper/finetune/$DATASET.yaml -rcfg cfgs/render/render_to_screen.yaml

I'm running kilonerf on Ubuntu18.04, CUDA11.1, GPU is A6000.
Could you please help me with this? Thank you very much!

Here's the output:

(kilonerf) nesc525@nesc525:~/drivers/5/kilonerf$ bash render_to_screen.sh
auto log path: logs/paper/finetune/Synthetic_NeRF_Lego
{'checkpoint_interval': 50000, 'chunk_size': 40000, 'distilled_cfg_path': 'cfgs/paper/distill/Synthetic_NeRF_Lego.yaml', 'distilled_checkpoint_path': 'logs/paper/distill/Synthetic_NeRF_Lego/checkpoint.pth', 'initial_learning_rate': 0.001, 'iterations': 1000000, 'l2_regularization_lambda': 1e-06, 'learing_rate_decay_rate': 500, 'no_batching': True, 'num_rays_per_batch': 8192, 'num_samples_per_ray': 384, 'occupancy_cfg_path': 'cfgs/paper/pretrain_occupancy/Synthetic_NeRF_Lego.yaml', 'occupancy_log_path': 'logs/paper/pretrain_occupancy/Synthetic_NeRF_Lego/occupancy.pth', 'perturb': 1.0, 'precrop_fraction': 0.5, 'precrop_iterations': 0, 'raw_noise_std': 0.0, 'render_only': False, 'no_color_sigmoid': False, 'render_test': True, 'render_factor': 0, 'testskip': 8, 'deepvoxels_shape': 'greek', 'blender_white_background': True, 'blender_half_res': False, 'llff_factor': 8, 'llff_no_ndc': False, 'llff_lindisp': False, 'llff_spherify': False, 'llff_hold': False, 'print_interval': 100, 'render_testset_interval': 10000, 'render_video_interval': 100000000, 'network_chunk_size': 65536, 'rng_seed': 0, 'use_same_initialization_for_all_networks': False, 'use_initialization_fix': False, 'num_importance_samples_per_ray': 0, 'model_type': 'multi_network', 'random_direction_probability': -1, 'von_mises_kappa': -1, 'view_dependent_dropout_probability': -1}
Using GPU: RTX A6000
/home/nesc525/drivers/5/kilonerf/utils.py:254: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  return np.array([[float(w) for w in line.strip().split()] for line in open(path)]).astype(np.float32)
Loaded a NSVF-style dataset (138, 800, 800, 4) (138, 4, 4) (0,) data/nsvf/Synthetic_NeRF/Lego
(100,) (13,) (25,)
Converting alpha to white.
global_domain_min: [-0.67 -1.2  -0.37], global_domain_max: [0.67 1.2  1.03], near: 2.0, far: 6.0, background_color: tensor([1., 1., 1.])
Loading logs/paper/finetune/Synthetic_NeRF_Lego/checkpoint_1000000.pth
Loading occupancy grid from logs/paper/pretrain_occupancy/Synthetic_NeRF_Lego/occupancy.pth
CUDA error at /home/chris/anti/cuda/render_to_screen.cpp:113 code=999(cudaErrorUnknown) "cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, pbo, cudaGraphicsMapFlagsWriteDiscard)" 
render_to_screen.sh: line 3: 28062 Segmentation fault      (core dumped) python run_nerf.py cfgs/paper/finetune/$DATASET.yaml -rcfg cfgs/render/render_to_screen.yaml
@windingwind
Copy link
Author

Following the suggestion here: https://forums.developer.nvidia.com/t/cudaerrorunknown-cudagraphicsglregisterbuffer/64406/12

After adding __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia The output shows:
(I tried with and without these enviroment values on different GPUs, including A6000 and 3090)

(kilonerf) nesc525@nesc525:~/drivers/5/kilonerf$ __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia CUDA_VISIBLE_DEVICES=1 bash render_to_screen.sh 
auto log path: logs/paper/finetune/Synthetic_NeRF_Lego
{'checkpoint_interval': 50000, 'chunk_size': 40000, 'distilled_cfg_path': 'cfgs/paper/distill/Synthetic_NeRF_Lego.yaml', 'distilled_checkpoint_path': 'logs/paper/distill/Synthetic_NeRF_Lego/checkpoint.pth', 'initial_learning_rate': 0.001, 'iterations': 1000000, 'l2_regularization_lambda': 1e-06, 'learing_rate_decay_rate': 500, 'no_batching': True, 'num_rays_per_batch': 8192, 'num_samples_per_ray': 384, 'occupancy_cfg_path': 'cfgs/paper/pretrain_occupancy/Synthetic_NeRF_Lego.yaml', 'occupancy_log_path': 'logs/paper/pretrain_occupancy/Synthetic_NeRF_Lego/occupancy.pth', 'perturb': 1.0, 'precrop_fraction': 0.5, 'precrop_iterations': 0, 'raw_noise_std': 0.0, 'render_only': False, 'no_color_sigmoid': False, 'render_test': True, 'render_factor': 0, 'testskip': 8, 'deepvoxels_shape': 'greek', 'blender_white_background': True, 'blender_half_res': False, 'llff_factor': 8, 'llff_no_ndc': False, 'llff_lindisp': False, 'llff_spherify': False, 'llff_hold': False, 'print_interval': 100, 'render_testset_interval': 10000, 'render_video_interval': 100000000, 'network_chunk_size': 65536, 'rng_seed': 0, 'use_same_initialization_for_all_networks': False, 'use_initialization_fix': False, 'num_importance_samples_per_ray': 0, 'model_type': 'multi_network', 'random_direction_probability': -1, 'von_mises_kappa': -1, 'view_dependent_dropout_probability': -1}
Using GPU: GeForce RTX 3090
/home/nesc525/drivers/5/kilonerf/utils.py:254: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  return np.array([[float(w) for w in line.strip().split()] for line in open(path)]).astype(np.float32)
Loaded a NSVF-style dataset (138, 800, 800, 4) (138, 4, 4) (0,) data/nsvf/Synthetic_NeRF/Lego
(100,) (13,) (25,)
Converting alpha to white.
global_domain_min: [-0.67 -1.2  -0.37], global_domain_max: [0.67 1.2  1.03], near: 2.0, far: 6.0, background_color: tensor([1., 1., 1.])
Loading logs/paper/finetune/Synthetic_NeRF_Lego/checkpoint_1000000.pth
Loading occupancy grid from logs/paper/pretrain_occupancy/Synthetic_NeRF_Lego/occupancy.pth
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  154 (GLX)
  Minor opcode of failed request:  24 (X_GLXCreateNewContext)
  Value in failed request:  0x0
  Serial number of failed request:  31
  Current serial number in output stream:  32

@windingwind
Copy link
Author

I just turned to another machine(Ubuntu 20.04, NVIDIA-SMI 460.67, CUDA Version: 11.2, RTX3090) and run bash render_to_screen.sh. The error infomation turns out to be the same.

The error seems to be related to the GLUT. However, I tested GLUT with a ray tracing code and visualized the result: everything seems to be fine, except the kilonerf render code.
TAT

@Quyans
Copy link

Quyans commented Mar 9, 2022

I met the same question. it sames like the author write the absolute address of his computer in the CUDA extention. since we dont have the /home/chris/anti

@Quyans
Copy link

Quyans commented Mar 9, 2022

hey I just made it. I just used a physical monitor which is connected to the GPU。 i guess it is not allowed to use it remote.

@windingwind
Copy link
Author

hey I just made it. I just used a physical monitor which is connected to the GPU。 i guess it is not allowed to use it remote.

i tried on a phisical monitor, the same error

@Quyans
Copy link

Quyans commented Mar 13, 2022

did u connected the monitor to the Integrated graphics card? u r supposed to connect the Discrete graphics card directly

@windingwind
Copy link
Author

did u connected the monitor to the Integrated graphics card? u r supposed to connect the Discrete graphics card directly

it was connected to the a6000. i’ll try another gpus later! thanks!

@Quyans
Copy link

Quyans commented Mar 13, 2022 via email

@Ataraxiaecho
Copy link

Hello, I'm currently experiencing the same problem, how did you solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants