Add support for compiling to SoA ABI for CUDA #1103

gmarkall · 2023-12-06T17:40:23Z

This adds a new function, compile_ptx_soa(), which works similarly to Numba's compile_ptx() function, but the compiled code's writes the elements of tuple return types into SoA data structures passed in as pointers.

Example

The Python function:

def addsub(x, y):
    return x + y, x - y

compiled with the compile_ptx_soa() function where:

Arguments are of int32 type
The return type is a tuple of (int32, int32)

compiles to PTX equivalent to th C function:

void (int32_t *r1, int32_t *r2, int32_t x, int32_t y)
{
  *r1 = x + y;
  *r2 = x - y;
}

Or, as the actual PTX produced:

.visible .func addsub(
	.param .b64 addsub_param_0,
	.param .b64 addsub_param_1,
	.param .b32 addsub_param_2,
	.param .b32 addsub_param_3
)
{
	.reg .b32 	%r<5>;
	.reg .b64 	%rd<3>;


	ld.param.u64 	%rd1, [addsub_param_0];
	ld.param.u64 	%rd2, [addsub_param_1];
	ld.param.u32 	%r1, [addsub_param_2];
	ld.param.u32 	%r2, [addsub_param_3];
	add.s32 	%r3, %r2, %r1;
	sub.s32 	%r4, %r1, %r2;
	st.u32 	[%rd1], %r3;
	st.u32 	[%rd2], %r4;
	ret;

}

Returning a heterogeneous tuple is also possible, For example where the return
type is specified as a tuple of (int32, float32) we get:

.visible .func addsub(
	.param .b64 addsub_param_0,
	.param .b64 addsub_param_1,
	.param .b32 addsub_param_2,
	.param .b32 addsub_param_3
)
{
	.reg .f32 	%f<2>;
	.reg .b32 	%r<4>;
	.reg .b64 	%rd<6>;


	ld.param.u64 	%rd1, [addsub_param_0];
	ld.param.u64 	%rd2, [addsub_param_1];
	ld.param.u32 	%r1, [addsub_param_2];
	ld.param.u32 	%r2, [addsub_param_3];
	add.s32 	%r3, %r2, %r1;
	cvt.s64.s32 	%rd3, %r1;
	cvt.s64.s32 	%rd4, %r2;
	sub.s64 	%rd5, %rd3, %rd4;
	cvt.rn.f32.s64 	%f1, %rd5;
	st.u32 	[%rd1], %r3;
	st.f32 	[%rd2], %f1;
	ret;
}

(Note the st.u32 for the first return value vs. st.f32 for the second).

copy-pr-bot · 2023-12-06T17:40:26Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

magnatelee · 2023-12-06T17:46:27Z

/ok to test

manopapad · 2023-12-06T22:20:26Z

typings/llvmlite/ir/builder.pyi

+        self,
+        agg: Value,
+        idx: Union[Iterable[int], int],
+        name: Optional[str] = "",


Nit: Frustratingly, Optional[str] is not a reference to the fact that the parameter has a default, it's just an alias for Union[str, None]. So if the function doesn't actually accept None (which would typically be the case if it's set to "" if not provided), it would be more appropriate to type it simply as str.

manopapad · 2023-12-06T22:25:17Z

typings/llvmlite/__init__.pyi

@@ -0,0 +1 @@
+from llvmlite import ir


I think these typings files should also get copyright headers?

manopapad · 2023-12-07T16:55:33Z

cunumeric/numba_utils.py

+    # values as SoA and some simplifications to keep it short
+    if not device:
+        raise NotImplementedError(
+            "Only device functions can be compiled for " "the SoA ABI"


Suggested change

"Only device functions can be compiled for " "the SoA ABI"

"Only device functions can be compiled for the SoA ABI"

gmarkall · 2023-12-11T21:43:12Z

@manopapad Many thanks for having a look over this - I'll resolve the issues you noted shortly.

It looks like tests also failed because Numba is not a dependency of cuNumeric, but it will need to be (at least for numba_utils to work) - I have looked through the various actions scripts and bits and pieces but I can't see what the correct way to ensure Numba is installed is - could you suggest how I can make sure Numba gets installed as required please?

manopapad · 2023-12-11T21:53:17Z

could you suggest how I can make sure Numba gets installed as required please?

Can you try adding numba around here? https://github.com/nv-legate/cunumeric/blob/branch-24.01/conda/conda-build/meta.yaml#L148

gmarkall · 2023-12-11T21:59:29Z

Thanks, just giving that a try now.

gmarkall · 2023-12-11T22:31:50Z

/ok to test

…-typing

RAMitchell · 2023-12-12T15:48:05Z

What are the python dependencies being introduced here? Just a note that we may want to make some features optional if they significantly increase packaging complexity.

gmarkall · 2023-12-12T16:01:20Z

The dependency being added is Numba, which also depends on llvmlite. These are both available on PyPI as wheels and on conda-forge for Linux x86_64, ppcle64, AArch64, macOS x86_64 and arm64, and Windows on x86_64.

In general I think projects adding Numba as a dependency don't tend to have issues with increased packaging complexity - are there any special cuNumeric packaging requirements I should think about / elaborate on here?

manopapad · 2023-12-12T20:11:29Z

As @gmarkall noted, numba is widely available, so I'm not very worried about making it a hard dependency. That said, if there's pushback we can certainly make it optional (only necessary if the user wants to use np.vectorize or similar UDF-accepting functions).

manopapad · 2024-02-26T23:15:50Z

@bryevdv we've gotten a bit bogged down with mypy on this, would you have some bandwidth to help resolve them?

bryevdv · 2024-02-26T23:44:22Z

@manopapad These errors are because the stubs under typings/numba are not sufficiently fleshed out. For instance, this change:

diff --git a/typings/numba/core/codegen.pyi b/typings/numba/core/codegen.pyi
index a4288cce..409c94c0 100644
--- a/typings/numba/core/codegen.pyi
+++ b/typings/numba/core/codegen.pyi
@@ -1,4 +1,7 @@
 class CodeLibrary:
     codegen: "Codegen"
 
+    @property
+    def name(self) -> str: ...
+
 class Codegen: ...

fixes this error:

cunumeric/numba_utils.py:153:9: error: "CodeLibrary" has no attribute "name"  [attr-defined]
            f"{lib.name}_function_",
            ^~~~~~~~~~~

Either the stubs need to be expanded to cover the usage in this file, or else this file added to the override exclusions in pyproject.toml

manopapad · 2024-02-27T00:12:49Z

@gmarkall Maybe for now it would be best to add the numba modules that are causing issues to the [[tool.mypy.overrides]] > module section of pyproject.toml, and you can later decide if you want to continue the effort of providing type hints for numba modules?

gmarkall · 2024-05-01T15:33:36Z

In the end I couldn't figure out how to make the overrides work so I ended up fixing up the typing. I've merged branch 24.03 and got CI green, so I think this is now ready for consideration / feedback.

bryevdv · 2024-05-01T15:56:17Z

@manopapad This is fairly self-contained, so it should be straightforward to port to internal without much conflict, but at this point, should it just land in internal to begin with?

ipdemes and others added 3 commits December 1, 2023 21:28

Add partial Numba typings

4dee095

Additional Numba and llvmlite typings for SoA compilation

895edf8

Add compilation to SoA ABI for CUDA

316e227

magnatelee added the category:new-feature PR introduces a new feature and will be classified as such in release notes label Dec 6, 2023

manopapad reviewed Dec 7, 2023

View reviewed changes

Add Numba as a dependency

35502dd

Merge remote-tracking branch 'origin/branch-24.01' into numba-soa-fix…

4449ed6

…-typing

WIP

5bbe4d6

marcinz changed the base branch from branch-24.01 to branch-24.03 February 22, 2024 01:03

gmarkall added 6 commits May 1, 2024 10:36

Merge branch 'branch-24.03' into numba-soa-fix-typing

966111a

Correct string literal formatting and mypy checking

5cbbbb4

Attempt to fix other mypy error

2d97902

Try adding cuda-nvcc to test env to provide NVVM

ee48694

Attempt to fix flake8

3dc7343

Another attempt to fix flake8

8265c47

gmarkall changed the title ~~[WIP] Add support for compiling to SoA ABI for CUDA~~ Add support for compiling to SoA ABI for CUDA May 1, 2024

gmarkall marked this pull request as ready for review May 1, 2024 15:41

gmarkall added 2 commits May 1, 2024 16:45

Fix typing nit

0af1f57

Add missing copyright headers

f879410

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for compiling to SoA ABI for CUDA #1103

Add support for compiling to SoA ABI for CUDA #1103

gmarkall commented Dec 6, 2023 •

edited

Loading

copy-pr-bot bot commented Dec 6, 2023

magnatelee commented Dec 6, 2023

manopapad Dec 6, 2023

manopapad Dec 6, 2023

manopapad Dec 7, 2023

gmarkall commented Dec 11, 2023

manopapad commented Dec 11, 2023

gmarkall commented Dec 11, 2023

gmarkall commented Dec 11, 2023

RAMitchell commented Dec 12, 2023

gmarkall commented Dec 12, 2023

manopapad commented Dec 12, 2023

manopapad commented Feb 26, 2024

bryevdv commented Feb 26, 2024 •

edited

Loading

manopapad commented Feb 27, 2024

gmarkall commented May 1, 2024

bryevdv commented May 1, 2024

	"Only device functions can be compiled for " "the SoA ABI"
	"Only device functions can be compiled for the SoA ABI"

Add support for compiling to SoA ABI for CUDA #1103

Are you sure you want to change the base?

Add support for compiling to SoA ABI for CUDA #1103

Conversation

gmarkall commented Dec 6, 2023 • edited Loading

Example

copy-pr-bot bot commented Dec 6, 2023

magnatelee commented Dec 6, 2023

manopapad Dec 6, 2023

Choose a reason for hiding this comment

manopapad Dec 6, 2023

Choose a reason for hiding this comment

manopapad Dec 7, 2023

Choose a reason for hiding this comment

gmarkall commented Dec 11, 2023

manopapad commented Dec 11, 2023

gmarkall commented Dec 11, 2023

gmarkall commented Dec 11, 2023

RAMitchell commented Dec 12, 2023

gmarkall commented Dec 12, 2023

manopapad commented Dec 12, 2023

manopapad commented Feb 26, 2024

bryevdv commented Feb 26, 2024 • edited Loading

manopapad commented Feb 27, 2024

gmarkall commented May 1, 2024

bryevdv commented May 1, 2024

gmarkall commented Dec 6, 2023 •

edited

Loading

bryevdv commented Feb 26, 2024 •

edited

Loading