Skip to content

Commit

Permalink
Add language clarifying GPU address in Spark on YARN with isolation (#93
Browse files Browse the repository at this point in the history
)

Explain why the user observes GPU index 0 for all executors, and limitations of using MIG in a common section regardless of the approach.

Signed-off-by: Gera Shegalov [email protected]
  • Loading branch information
gerashegalov authored Feb 1, 2022
1 parent d554061 commit d6db92b
Show file tree
Hide file tree
Showing 3 changed files with 65 additions and 13 deletions.
10 changes: 2 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,8 @@ This is an example of the GPU accelerated PCA algorithm running on Spark. For de
[guide](/examples/Spark-cuML/pca/README.md).

### 5. MIG support
We provide some guides about the Multi-Instance GPU (MIG) feature based on the NVIDIA Ampere architecture (such as NVIDIA A100 and A30) GPU.
- [YARN 3.3.0+ MIG GPU Plugin](/examples/MIG-Support/device-plugins/gpu-mig) for adding a Java-based plugin for MIG
on top of the Pluggable Device Framework
- [YARN 3.1.2 until YARN 3.3.0 MIG GPU Support](/examples/MIG-Support/resource-types/gpu-mig) for
patching and rebuilding YARN code base to support MIG devices.
- [YARN 3.1.2+ MIG GPU Support without modifying YARN / Device Plugin Code](/examples/MIG-Support/yarn-unpatched)
relying on installing nvidia CLI wrappers written in `bash`, but unlike the solutions above without
any Java code changes.
We provide some [guides](/examples/MIG-Support/README.md) about the Multi-Instance GPU (MIG) feature based on
the NVIDIA Ampere architecture (such as NVIDIA A100 and A30) GPU.

## API
### 1. Xgboost examples API
Expand Down
61 changes: 61 additions & 0 deletions examples/MIG-Support/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Multi-Instance GPU (MIG) support in Apache Hadoop YARN

There are multiple solutions for MIG scheduling on YARN that you can choose based on your environment and
deployment requirements:

- [YARN 3.3.0+ MIG GPU Plugin](/examples/MIG-Support/device-plugins/gpu-mig) for adding a Java-based plugin for MIG
on top of the Pluggable Device Framework
- [YARN 3.1.2 until YARN 3.3.0 MIG GPU Support](/examples/MIG-Support/resource-types/gpu-mig) for
patching and rebuilding YARN code base to support MIG devices.
- [YARN 3.1.2+ MIG GPU Support without modifying YARN / Device Plugin Code](/examples/MIG-Support/yarn-unpatched)
relying on installing nvidia CLI wrappers written in `bash`, but unlike the solutions above without
any Java code changes.

## Limitations and Caveats

Note that are some common caveats for the solutions above.

### Single MIG GPU per Container

Please see the [MIG Application Considerations](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#app-considerations)
and [CUDA Device Enumeration](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#cuda-visible-devices).

It is important to note that CUDA 11 only supports enumeration of a single MIG instance.
It is recommended that you configure YARN to only allow a single GPU be requested. See
the YARN config `yarn.resource-types.nvidia/miggpu.maximum-allocation` for the [Pluggable Device Framework]
(/examples/MIG-Support/device-plugins/gpu-mig) solution and
`yarn.resource-types.yarn.io/gpu.maximum-allocation` for the remainder of MIG Support options above, respectively.

### Metrics
Some metrics are not and cannot be broken down by MIG device. For example, `utilization` is the
aggregate utilization of the parent GPU, and there is no attribution of `temperature` to a
particular MIG device.

### GPU index / address as reported by Apache Spark in logs and UI

With YARN isolation using NVIDIA Container Runtime ensuring a single visible device
per Docker container running a Spark Executor, each Executor will see a disjoint list comprising
a single device.
Therefore, the user will end up observing index 0 being used by all executors. However, they refer
to different GPU/MIG instances. You can verify this by running something like the following on a
YARN worker node host OS:

```bash
for cid in $(sudo docker ps -q); do sudo docker exec $cid bash -c "printenv | grep VISIBLE; nvidia-smi -L"; done
NVIDIA_VISIBLE_DEVICES=3
GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc1-ab62-dd12f2227b7d)
MIG 1g.6gb Device 0: (UUID: MIG-70dc024a-e8d7-587c-81dd-57ad493b1d91)
NVIDIA_VISIBLE_DEVICES=1
GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc1-ab62-dd12f2227b7d)
MIG 1c.2g.12gb Device 0: (UUID: MIG-54cc2421-6f2d-59e9-b074-20707aadd71e)
NVIDIA_VISIBLE_DEVICES=2
GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc1-ab62-dd12f2227b7d)
MIG 1g.6gb Device 0: (UUID: MIG-7e5552bf-d328-57a8-b091-0720d4530ffb)
NVIDIA_VISIBLE_DEVICES=0
GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc1-ab62-dd12f2227b7d)
MIG 1c.2g.12gb Device 0: (UUID: MIG-e6af58f0-9af8-594f-825e-74d23e1a68c1)
```




7 changes: 2 additions & 5 deletions examples/MIG-Support/yarn-unpatched/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ to discover GPUs. It replaces MIG-enabled GPUs with the list of `<gpu>` elements
## Installation

These instructions assume NVIDIA Container Toolkit (nvidia-docker2) and YARN is already installed
and configured with [CGroups enabled](https://hadoop.apache.org/docs/r3.1.2/hadoop-yarn/hadoop-yarn-site/UsingGpus.html).
and configured with GPU Scheduling and
[CGroups enabled](https://hadoop.apache.org/docs/r3.1.2/hadoop-yarn/hadoop-yarn-site/UsingGpus.html).

Enable and configure your [GPUs with MIG](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html) on all of the nodes
it applies to.
Expand Down Expand Up @@ -76,7 +77,3 @@ environment = [ "MIG_AS_GPU_ENABLED=1", "REAL_NVIDIA_SMI_PATH=/if/non-default/p
Note, the values for `MIG_AS_GPU_ENABLED`, `REAL_NVIDIA_SMI_PATH`, `ENABLE_NON_MIG_GPUS` should be
identical to the ones specified in `yarn-env.sh`.

## Limitations and Caveats
Some metrics are not and cannot be broken down by MIG device. For example, `utilization` is the
aggregate utilization of the parent GPU, and there is no attribution of `temperature` to a
particular MIG device.

0 comments on commit d6db92b

Please sign in to comment.