fix: add cuda backend support for `to_raggedtensor` and `from_raggedtensor` functions #3263

maxymnaumchyk · 2024-10-01T15:03:27Z

No description provided.

codecov · 2024-10-01T15:10:06Z

Codecov Report

Attention: Patch coverage is 11.36364% with 39 lines in your changes missing coverage. Please review.

Project coverage is 82.18%. Comparing base (b749e49) to head (5cfccda).
Report is 176 commits behind head on main.

Files with missing lines	Patch %	Lines
src/awkward/operations/ak_to_raggedtensor.py	13.33%	26 Missing ⚠️
src/awkward/operations/ak_from_raggedtensor.py	7.14%	13 Missing ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
src/awkward/operations/ak_from_raggedtensor.py	`23.07% <7.14%> (ø)`
src/awkward/operations/ak_to_raggedtensor.py	`21.81% <13.33%> (ø)`

... and 157 files with indirect coverage changes

maxymnaumchyk · 2024-10-01T15:25:38Z

@jpivarski while trying to make the to_raggedtensor function keep the device of the original awkward array I stumbled upon an issue. The thing is, tensorflow automatically selects gpu for computation, if it's available. And if I try to run the following code on gpu, it does return a tensor on cpu:

import tensorflow as tf

def function():
    with tf.device('CPU:0'):
        a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
        return a

a = function()
a.device

>>/job:localhost/replica:0/task:0/device:CPU:0

However if try to do the same with the to_raggedtensor function, the intermediate ragged tensor is allocated on cpu (the line 78 print says that it's on cpu) but the resulting tensor is allocated on gpu:

to_raggedtensor(ak.Array([[[1.1, 2.2], [3.3]], [], [[4.4, 5.5]]]))[0][0].device
>>/job:localhost/replica:0/task:0/device:GPU:0

Should I make the function use a TensorFlow policy and automatically select a device or create some kind of workaround?

jpivarski · 2024-10-01T17:24:44Z

ak.to_raggedtensor should return a RaggedTensor on the same device as the Awkward Array, as a view (no copy) if possible. That may mean that the implementation needs to specify non-default arguments of the RaggedTensor constructor (or use the with block) in order to control it.

If this is not possible and TensorFlow returns an object whose backend depends on what hardware is available (a terrible practice! shame on TensorFlow!), then we'll have to explain that (apologetically) in our documentation.

…sor-conversions

…r-raggedtensor-conversions

…or-raggedtensor-conversions' into maxymnaumchyk/add-cuda-support-for-raggedtensor-conversions

jpivarski

This is looking good! I added some possible changes—actually, "things to think about" because you know the TensorFlow situation better than I do.

This could also use tests. Would it be sufficient to copy the to/from raggedtensor tests from the tests/ directory to tests-cuda/ and replace NumPy arrays with CuPy arrays?

Just as you can run the normal tests with

python -m pytest tests

you can run the CUDA tests with

python -m pytest tests-cuda

on a computer with an Nvidia GPU.

src/awkward/operations/ak_from_raggedtensor.py

jpivarski

This is good! Except maybe for the case of more than 10 GPUs: see below. Once that's fixed, this would be ready to merge.

jpivarski · 2024-10-21T10:30:38Z

src/awkward/operations/ak_from_raggedtensor.py

@@ -79,9 +63,18 @@ def _impl(array):


 def _tensor_to_np_or_cp(array, device):
- import tensorflow as tf
+ if device.endswith("GPU", 0, -2):


I had to check the documentation on str.endswith, but it seems that this is equivalent to

if device[:-2].endswith("GPU"):

(though I think the latter is easier to understand because slicing is more well-known than the extra arguments of str.endswith).

However, are you assuming that the GPU number is one digit? That is, will the above code break for a computer with 10 GPUs?

If the format for the 15th GPU is "GPU-14" (zero-indexed), then maybe you want

Suggested change

if device.endswith("GPU", 0, -2):

if device.split("-")[0] == "GPU":

(and if lowercase is possible, you can also add a .upper() in the chain).

But before you accept the suggestion above, is it really a hyphen? If there's only one GPU, would there be no hyphen? (Note that device.split("-")[0] is equal to device if there is no hyphen, so the same code may be fine.)

Thanks for catching that, I haven't thought about that case! If there's only one GPU, then the device looks like this:
/job:localhost/replica:0/task:0/device:GPU:0
So, I think if device.split(":")[-2].upper() == "GPU": will work for all cases.

It will, but it relies on TensorFlow never changing the text to end with "GPU" rather than "GPU:0". All of this is about trying to write something defensively, so that either our incomplete knowledge of the upstream library (TensorFlow) or possible changes in that upstream library would cause our code to break. By "break," I mean "do the wrong thing without an error message." Failing with an error message if TensorFlow changes would be fine.

Given that what we expect from TensorFlow is a string like

/job:localhost/replica:0/task:0/device:CPU:0

or

/job:localhost/replica:0/task:0/device:GPU:0

or

/job:localhost/replica:0/task:0/device:GPU:14

this would be a safe way to catch it:

import re m = re.match(".*:(CPU|GPU):[0-9]+", device) if m is not None: raise NotImplementedError(f"TensorFlow device has an unexpected format: {device!r}") if m.groups()[0] == "GPU": ...

It also expresses to the future maintainer (or code reviewer) what you know about what TensorFlow gives you. (The import needs to be in the import section.)

maxymnaumchyk added 2 commits October 1, 2024 17:51

add cuda backend support

303f78f

style changes

7f9bb74

maxymnaumchyk mentioned this pull request Oct 3, 2024

Add interoperability between Awkward Array and ML libraries #3267

Open

7 tasks

ianna and others added 8 commits October 8, 2024 22:01

Merge branch 'main' into maxymnaumchyk/add-cuda-support-for-raggedten…

832e1ed

…sor-conversions

keep gpu id the same

4189a02

style changes

27f9e44

Merge branch 'scikit-hep:main' into maxymnaumchyk/add-cuda-support-fo…

9fd6e8e

…r-raggedtensor-conversions

fix device id selection

e58772b

add new functions to the documentation

2237922

Merge remote-tracking branch 'origin/maxymnaumchyk/add-cuda-support-f…

23b0b4b

…or-raggedtensor-conversions' into maxymnaumchyk/add-cuda-support-for-raggedtensor-conversions

add cuda backend support for ak.from_raggedtensor

e23f29a

maxymnaumchyk marked this pull request as ready for review October 16, 2024 15:09

maxymnaumchyk requested review from jpivarski and ianna October 16, 2024 15:09

jpivarski reviewed Oct 16, 2024

View reviewed changes

src/awkward/operations/ak_from_raggedtensor.py Outdated Show resolved Hide resolved

src/awkward/operations/ak_from_raggedtensor.py Outdated Show resolved Hide resolved

add suggestions from Jim

5cfccda

maxymnaumchyk requested a review from jpivarski October 20, 2024 22:34

jpivarski approved these changes Oct 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add cuda backend support for `to_raggedtensor` and `from_raggedtensor` functions #3263

fix: add cuda backend support for `to_raggedtensor` and `from_raggedtensor` functions #3263

maxymnaumchyk commented Oct 1, 2024 •

edited

Loading

codecov bot commented Oct 1, 2024 •

edited

Loading

maxymnaumchyk commented Oct 1, 2024 •

edited

Loading

jpivarski commented Oct 1, 2024

jpivarski left a comment

jpivarski left a comment

jpivarski Oct 21, 2024

maxymnaumchyk Oct 21, 2024 •

edited

Loading

jpivarski Oct 21, 2024

	if device.endswith("GPU", 0, -2):
	if device.split("-")[0] == "GPU":

fix: add cuda backend support for to_raggedtensor and from_raggedtensor functions #3263

Are you sure you want to change the base?

fix: add cuda backend support for to_raggedtensor and from_raggedtensor functions #3263

Conversation

maxymnaumchyk commented Oct 1, 2024 • edited Loading

codecov bot commented Oct 1, 2024 • edited Loading

Codecov Report

maxymnaumchyk commented Oct 1, 2024 • edited Loading

jpivarski commented Oct 1, 2024

jpivarski left a comment

Choose a reason for hiding this comment

jpivarski left a comment

Choose a reason for hiding this comment

jpivarski Oct 21, 2024

Choose a reason for hiding this comment

maxymnaumchyk Oct 21, 2024 • edited Loading

Choose a reason for hiding this comment

jpivarski Oct 21, 2024

Choose a reason for hiding this comment

fix: add cuda backend support for `to_raggedtensor` and `from_raggedtensor` functions #3263

fix: add cuda backend support for `to_raggedtensor` and `from_raggedtensor` functions #3263

maxymnaumchyk commented Oct 1, 2024 •

edited

Loading

codecov bot commented Oct 1, 2024 •

edited

Loading

maxymnaumchyk commented Oct 1, 2024 •

edited

Loading

maxymnaumchyk Oct 21, 2024 •

edited

Loading