-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
**NEVER MERGE!** Comparison: rocm-dev
vs branch-24.06
#1
base: branch-24.06
Are you sure you want to change the base?
Conversation
* Add ucx-py dependency to CI
To eliminate hard-coding, generalize the GHA workflow logic to select one build for testing. This should simplify future Dask-CUDA updates. xref: rapidsai/build-planning#25 Authors: - https://github.com/jakirkham Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#1318
NumPy 2 is expected to be released in the near future. For the RAPIDS 24.04 release, we will pin to `numpy>=1.23,<2.0a0`. This PR adds an upper bound to affected RAPIDS repositories. xref: rapidsai/build-planning#29 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#1320
@@ -10,7 +32,11 @@ | |||
from typing import Optional | |||
|
|||
import numpy as np | |||
import pynvml | |||
from dask_cuda import DASK_USE_ROCM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younseojava IMO You could simplify
from dask_cuda import DASK_USE_ROCM
if DASK_USE_ROCM:
from pyrsmi import rocml as pynvml
else:
import pynvml
to
from pyrsmi import rocml as pynvml
as users with CUDA device will likely use the rapidsai/dask_cuda
package and the NVIDIA rapidsai
org will likely not accept a version that supports both backends.
dask_cuda/__init__.py
Outdated
import os | ||
|
||
|
||
def is_amd_gpu_available(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.
dask_cuda/__init__.py
Outdated
return False | ||
|
||
|
||
DASK_USE_ROCM = is_amd_gpu_available() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.
|
||
|
||
DASK_USE_ROCM = is_amd_gpu_available() | ||
print("ROCM device found") if DASK_USE_ROCM else print("ROCM device not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.
import logging | ||
import os | ||
|
||
import click | ||
import numba.cuda | ||
from dask_cuda import DASK_USE_ROCM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.
So
if DASK_USE_ROCM:
from hip import hip as hiprt
else:
import numba.cuda
could be simplified to:
from hip import hip as hiprt
numba.cuda.current_context() | ||
except numba.cuda.cudadrv.error.CudaSupportError: | ||
pass | ||
if DASK_USE_ROCM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.
else: | ||
numba.cuda.current_context() | ||
if int(os.environ.get("DASK_CUDA_TEST_SINGLE_GPU", "0")) != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younseojava Why is this check not relevant for the AMD GPU version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I don't fully understand the purpose of the env variable. Also, with hip port of numba, how can this function be changed? can it be further simplified?
@@ -399,7 +422,7 @@ def new_worker_spec(self): | |||
"plugins": { | |||
CPUAffinity( | |||
get_cpu_affinity(nvml_device_index(0, visible_devices)) | |||
), | |||
) if not DASK_USE_ROCM else None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.
So:
) if False else None,
@@ -4,6 +4,9 @@ | |||
import signal | |||
import time | |||
from functools import partial | |||
import signal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younseojava These imports are a repetition of the previous 3.
dask_cuda/utils.py
Outdated
@@ -1,3 +1,25 @@ | |||
# Apache License | |||
# | |||
# Copyright (c) 2023 Advanced Micro Devices, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Should be
Modifications Copyright (c) 2024 ...
- Prepend the NVIDIA copyright.
dask_cuda/local_cuda_cluster.py
Outdated
@@ -1,3 +1,25 @@ | |||
# Apache License | |||
# | |||
# Copyright (c) 2023 Advanced Micro Devices, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Should be
Modifications Copyright (c) 2024 ...
- Prepend the NVIDIA copyright.
dask_cuda/initialize.py
Outdated
@@ -1,8 +1,34 @@ | |||
# Apache License | |||
# | |||
# Copyright (c) 2023 Advanced Micro Devices, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Should be
Modifications Copyright (c) 2024 ...
- Prepend the NVIDIA copyright.
dask_cuda/__init__.py
Outdated
@@ -1,3 +1,25 @@ | |||
# Apache License | |||
# | |||
# Copyright (c) 2023 Advanced Micro Devices, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Should be
Modifications Copyright (c) 2024 ...
- Prepend the NVIDIA copyright.
ci/gpu/build.sh
Outdated
@@ -0,0 +1,106 @@ | |||
#!/bin/bash | |||
# Copyright (c) 2018, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifications Copyright (c) 2024 Advanced Micro Devices, Inc.
missing.- AMD License missing (if different). Ideally include the original
dask_cuda
license below the NVIDIA copyright note.
build_rocm.sh
Outdated
@@ -0,0 +1,33 @@ | |||
# Apache License | |||
# | |||
# Copyright (c) 2023 Advanced Micro Devices, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younseojava 2024
WARNING: This PR exists only to compare the hipified
rocm-dev
branch vs.rapidsai/dask_cuda
branchbranch-24.06
. Do not merge it!