Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define "GPU" as a worker resource #1401

Open
wants to merge 2 commits into
base: branch-24.12
Choose a base branch
from

Conversation

pentschev
Copy link
Member

Add "GPU" as a worker resource to each CUDA worker. This should support users in identifying whether the workers available contain a GPU resource.

@pentschev pentschev requested a review from a team as a code owner October 23, 2024 20:52
@github-actions github-actions bot added the python python code needed label Oct 23, 2024
resources = dict(pair.split("=") for pair in resources)
resources = valmap(float, resources)
gpu_resources = valmap(int, itemfilter(lambda x: x != "GPU", resources))
resources = valmap(float, itemfilter(lambda x: x == "GPU", resources))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacobtomlinson you have written this section originally, I presume mapping values to float was done to support a definition such as "MEMORY" but since I don't see any tests or any other explicit mention in Dask-CUDA, could you comment if that's right and whether you can think of more robust ways for us to handle types here other than what I wrote above as "GPU" or NOT "GPU"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I don't remember the reasoning. I think dask handles these values as floats, as you say it's to support values like MEMORY or other arbitrary quantities.

if "GPU" not in worker_kwargs["resources"]:
worker_kwargs["GPU"] = 1
else:
worker_kwargs["resources"] = {"GPU": 1}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense to use {"cuda": 1} (i.e. use "cuda" as the resource name for a CUDA-capable GPU)?
This would align with the convention used in DL (e.g. pytorch)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind personally, but the keyword GPU is documented in Distributed worker resources docs, so that seems like a more universal option and better documented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay nice - That's good enough for me.

@pentschev pentschev added 3 - Ready for Review Ready for review by team feature request New feature or request non-breaking Non-breaking change labels Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team feature request New feature or request non-breaking Non-breaking change python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants