We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi,
When I was run this code (https://saturncloud.io/docs/examples/python/pytorch/qs-03-pytorch-gpu-dask-single-model/), I get this error:
daskcluster-worker-1 | 2022-11-13 17:01:17,386 - distributed.worker - WARNING - Compute Failed daskcluster-worker-1 | Key: dispatch_with_ddp-cbbbf432f092a3807b25cc40c48f7660 daskcluster-worker-1 | Function: dispatch_with_ddp daskcluster-worker-1 | args: () daskcluster-worker-1 | kwargs: {'pytorch_function': <function train at 0x7f06b9bba040>, 'master_addr': '172.23.0.4', 'master_port': 12345, 'rank': 1, 'world_size': 2, 'backend': 'nccl'} daskcluster-worker-1 | Exception: 'AssertionError()' daskcluster-worker-1 | daskcluster-worker-2 | 2022-11-13 17:01:17,387 - distributed.worker - WARNING - Compute Failed daskcluster-worker-2 | Key: dispatch_with_ddp-9ce4ce0b9f5f85ff8ead8f6f2e9a9bcf daskcluster-worker-2 | Function: dispatch_with_ddp daskcluster-worker-2 | args: () daskcluster-worker-2 | kwargs: {'pytorch_function': <function train at 0x7fa21dd38940>, 'master_addr': '172.23.0.4', 'master_port': 12345, 'rank': 0, 'world_size': 2, 'backend': 'nccl'} daskcluster-worker-2 | Exception: 'AssertionError()'
Why I did get this error? Can you help me?
Thank you.
The text was updated successfully, but these errors were encountered:
It's been a while but do you by any chance try to train on cpu? If so you have to set the backend to gloo like so:
futures = dispatch.run(client, train_function, backend='gloo')
by default it is nccl which is for GPU.
nccl
Sorry, something went wrong.
No branches or pull requests
Hi,
When I was run this code (https://saturncloud.io/docs/examples/python/pytorch/qs-03-pytorch-gpu-dask-single-model/), I get this error:
Why I did get this error? Can you help me?
Thank you.
The text was updated successfully, but these errors were encountered: