Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] EMD loss cannot handle input size less than 4096 either #21

Open
Lotayou opened this issue Jan 22, 2019 · 2 comments
Open

[Bug] EMD loss cannot handle input size less than 4096 either #21

Lotayou opened this issue Jan 22, 2019 · 2 comments

Comments

@Lotayou
Copy link

Lotayou commented Jan 22, 2019

@yulequan I just found out that EMD loss module would crash too even if the input size is smaller than 4096.

Here are the error message:

Warning: Input parameter 2048 has been switched to 1722 for dyna_patch dataset...
vcl-dl-3
Namespace(batch_size=1, dataset='dyna_patch', gpu='0', learning_rate=0.001, log_dir='../model/debug', max_epoch=120, num_point=2048, phase='train', test_dir='../data/test_data/our_collected_data/MC_5k', up_ratio=2)
Traceback (most recent call last):
  File "main.py", line 277, in <module>
    assert not os.path.exists(os.path.join(MODEL_DIR, 'code/'))
AssertionError
(yanglingbo) ylb@vcl-dl-3:~/projects/3D_mesh_SR/PU-Net/code$ sh train_dyna_patch.sh
Warning: Input parameter 2048 has been switched to 1722 for dyna_patch dataset...
vcl-dl-3
Namespace(batch_size=1, dataset='dyna_patch', gpu='0', learning_rate=0.001, log_dir='../model/debug', max_epoch=120, num_point=2048, phase='train', test_dir='../data/test_data/our_collected_data/MC_5k', up_ratio=2)
2019-01-22 15:43:58.696137: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-01-22 15:44:00.175833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:02:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-01-22 15:44:00.176297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
use randominput, input h5 file is: ../h5_data/dyna_patch_dataset_pu_net.h5
Normalization the data
total 10220 samples
NUM_BATCH is 10220
True True
**** EPOCH 000 ****
2019-01-22 15:44:24.039949: E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
2019-01-22 15:44:24.040244: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected Event status: 1
Aborted (core dumped)

Data Processing
My original mesh contains 6890 points, to cope with the EMD size contraint I split each human in left and right halves, with 3444 points, and I choose downsample ratio r=2, so the downsampled input contains 1722 points, and the output should also contain 3444 points. However, the error still happens just as when my input is over 4096 points. In the meantime, training the author provided 4096-point dataset works without problem.

Configuration

CUDA 9.0
CUDNN 7005
Python 3.6
Tensorflow 1.5.1

Also in #3 .

@MrXiaoZhen
Copy link

have you solve this problem???

@MrXiaoZhen
Copy link

@Lotayou

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants