Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple GPU issue #76

Open
hnhuang opened this issue Aug 25, 2017 · 3 comments
Open

Multiple GPU issue #76

hnhuang opened this issue Aug 25, 2017 · 3 comments

Comments

@hnhuang
Copy link

hnhuang commented Aug 25, 2017

Hi,

My model can run on a single GPU, but it failed on multiple GPU. Here is my code:

x_train, y_train = batch_reader.get_batch()
gpu_list = ["gpu(0)", "gpu(1)", "gpu(2)", "gpu(3)"]
model_dist.compile(loss=losses.dist_loss_cls(C.max_radius), optimizer=optimizer, context=gpu_list)
model_dist.fit(x_train, y_train, batch_size=20, nb_epoch = num_epochs, callbacks=[checkpoint_fixed_name])

The error I got was:

RuntimeError: simple_bind error. Arguments:
input_1: (5, 1L, 32L, 32L, 32L)
[13:36:31] src/storage/storage.cc:59: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: invalid device ordinal

Would anyone please help me? Thanks.

@sandeep-krishnamurthy
Copy link

Issue seems to be that you don't have that many GPUs.
May be you could run - "nvidia-smi" command on terminal and report if you have 4 GPUs?

@hnhuang
Copy link
Author

hnhuang commented Aug 25, 2017

I do have 4 GPUs.

@sandeep-krishnamurthy
Copy link

I tried Resnet50 example here - https://github.com/dmlc/keras/blob/master/examples/cifar10_resnet50.py with multiple GPUs and things seems to work fine. Can you please let me know more details on the setup you have, version of MXNet, any CUDA specific environment variables set, code you are using.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants