Multiple GPU issue #76

hnhuang · 2017-08-25T18:01:53Z

Hi,

My model can run on a single GPU, but it failed on multiple GPU. Here is my code:

x_train, y_train = batch_reader.get_batch()
gpu_list = ["gpu(0)", "gpu(1)", "gpu(2)", "gpu(3)"]
model_dist.compile(loss=losses.dist_loss_cls(C.max_radius), optimizer=optimizer, context=gpu_list)
model_dist.fit(x_train, y_train, batch_size=20, nb_epoch = num_epochs, callbacks=[checkpoint_fixed_name])

The error I got was:

RuntimeError: simple_bind error. Arguments:
input_1: (5, 1L, 32L, 32L, 32L)
[13:36:31] src/storage/storage.cc:59: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: invalid device ordinal

Would anyone please help me? Thanks.

sandeep-krishnamurthy · 2017-08-25T18:40:47Z

Issue seems to be that you don't have that many GPUs.
May be you could run - "nvidia-smi" command on terminal and report if you have 4 GPUs?

hnhuang · 2017-08-25T19:11:15Z

I do have 4 GPUs.

sandeep-krishnamurthy · 2017-08-28T03:03:31Z

I tried Resnet50 example here - https://github.com/dmlc/keras/blob/master/examples/cifar10_resnet50.py with multiple GPUs and things seems to work fine. Can you please let me know more details on the setup you have, version of MXNet, any CUDA specific environment variables set, code you are using.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple GPU issue #76

Multiple GPU issue #76

hnhuang commented Aug 25, 2017

sandeep-krishnamurthy commented Aug 25, 2017

hnhuang commented Aug 25, 2017

sandeep-krishnamurthy commented Aug 28, 2017

Multiple GPU issue #76

Multiple GPU issue #76

Comments

hnhuang commented Aug 25, 2017

sandeep-krishnamurthy commented Aug 25, 2017

hnhuang commented Aug 25, 2017

sandeep-krishnamurthy commented Aug 28, 2017