From d6db92bf7ad9364b30ba278f00719edfeedaf2df Mon Sep 17 00:00:00 2001 From: Gera Shegalov Date: Tue, 1 Feb 2022 09:48:14 -0800 Subject: [PATCH] Add language clarifying GPU address in Spark on YARN with isolation (#93) Explain why the user observes GPU index 0 for all executors, and limitations of using MIG in a common section regardless of the approach. Signed-off-by: Gera Shegalov gera@apache.org --- README.md | 10 +-- examples/MIG-Support/README.md | 61 +++++++++++++++++++ examples/MIG-Support/yarn-unpatched/README.md | 7 +-- 3 files changed, 65 insertions(+), 13 deletions(-) create mode 100644 examples/MIG-Support/README.md diff --git a/README.md b/README.md index 0a0f80aea..bad587561 100644 --- a/README.md +++ b/README.md @@ -45,14 +45,8 @@ This is an example of the GPU accelerated PCA algorithm running on Spark. For de [guide](/examples/Spark-cuML/pca/README.md). ### 5. MIG support -We provide some guides about the Multi-Instance GPU (MIG) feature based on the NVIDIA Ampere architecture (such as NVIDIA A100 and A30) GPU. -- [YARN 3.3.0+ MIG GPU Plugin](/examples/MIG-Support/device-plugins/gpu-mig) for adding a Java-based plugin for MIG -on top of the Pluggable Device Framework -- [YARN 3.1.2 until YARN 3.3.0 MIG GPU Support](/examples/MIG-Support/resource-types/gpu-mig) for -patching and rebuilding YARN code base to support MIG devices. -- [YARN 3.1.2+ MIG GPU Support without modifying YARN / Device Plugin Code](/examples/MIG-Support/yarn-unpatched) -relying on installing nvidia CLI wrappers written in `bash`, but unlike the solutions above without -any Java code changes. +We provide some [guides](/examples/MIG-Support/README.md) about the Multi-Instance GPU (MIG) feature based on +the NVIDIA Ampere architecture (such as NVIDIA A100 and A30) GPU. ## API ### 1. Xgboost examples API diff --git a/examples/MIG-Support/README.md b/examples/MIG-Support/README.md new file mode 100644 index 000000000..0e9c36ded --- /dev/null +++ b/examples/MIG-Support/README.md @@ -0,0 +1,61 @@ +# Multi-Instance GPU (MIG) support in Apache Hadoop YARN + +There are multiple solutions for MIG scheduling on YARN that you can choose based on your environment and +deployment requirements: + +- [YARN 3.3.0+ MIG GPU Plugin](/examples/MIG-Support/device-plugins/gpu-mig) for adding a Java-based plugin for MIG +on top of the Pluggable Device Framework +- [YARN 3.1.2 until YARN 3.3.0 MIG GPU Support](/examples/MIG-Support/resource-types/gpu-mig) for +patching and rebuilding YARN code base to support MIG devices. +- [YARN 3.1.2+ MIG GPU Support without modifying YARN / Device Plugin Code](/examples/MIG-Support/yarn-unpatched) +relying on installing nvidia CLI wrappers written in `bash`, but unlike the solutions above without +any Java code changes. + +## Limitations and Caveats + +Note that are some common caveats for the solutions above. + +### Single MIG GPU per Container + +Please see the [MIG Application Considerations](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#app-considerations) +and [CUDA Device Enumeration](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#cuda-visible-devices). + +It is important to note that CUDA 11 only supports enumeration of a single MIG instance. +It is recommended that you configure YARN to only allow a single GPU be requested. See +the YARN config `yarn.resource-types.nvidia/miggpu.maximum-allocation` for the [Pluggable Device Framework] +(/examples/MIG-Support/device-plugins/gpu-mig) solution and +`yarn.resource-types.yarn.io/gpu.maximum-allocation` for the remainder of MIG Support options above, respectively. + +### Metrics +Some metrics are not and cannot be broken down by MIG device. For example, `utilization` is the +aggregate utilization of the parent GPU, and there is no attribution of `temperature` to a +particular MIG device. + +### GPU index / address as reported by Apache Spark in logs and UI + +With YARN isolation using NVIDIA Container Runtime ensuring a single visible device +per Docker container running a Spark Executor, each Executor will see a disjoint list comprising +a single device. +Therefore, the user will end up observing index 0 being used by all executors. However, they refer +to different GPU/MIG instances. You can verify this by running something like the following on a +YARN worker node host OS: + +```bash +for cid in $(sudo docker ps -q); do sudo docker exec $cid bash -c "printenv | grep VISIBLE; nvidia-smi -L"; done +NVIDIA_VISIBLE_DEVICES=3 +GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc1-ab62-dd12f2227b7d) + MIG 1g.6gb Device 0: (UUID: MIG-70dc024a-e8d7-587c-81dd-57ad493b1d91) +NVIDIA_VISIBLE_DEVICES=1 +GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc1-ab62-dd12f2227b7d) + MIG 1c.2g.12gb Device 0: (UUID: MIG-54cc2421-6f2d-59e9-b074-20707aadd71e) +NVIDIA_VISIBLE_DEVICES=2 +GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc1-ab62-dd12f2227b7d) + MIG 1g.6gb Device 0: (UUID: MIG-7e5552bf-d328-57a8-b091-0720d4530ffb) +NVIDIA_VISIBLE_DEVICES=0 +GPU 0: NVIDIA A30 (UUID: GPU-05aa99be-b706-0dc1-ab62-dd12f2227b7d) + MIG 1c.2g.12gb Device 0: (UUID: MIG-e6af58f0-9af8-594f-825e-74d23e1a68c1) +``` + + + + diff --git a/examples/MIG-Support/yarn-unpatched/README.md b/examples/MIG-Support/yarn-unpatched/README.md index 4ec3279c4..c36e7f043 100644 --- a/examples/MIG-Support/yarn-unpatched/README.md +++ b/examples/MIG-Support/yarn-unpatched/README.md @@ -20,7 +20,8 @@ to discover GPUs. It replaces MIG-enabled GPUs with the list of `` elements ## Installation These instructions assume NVIDIA Container Toolkit (nvidia-docker2) and YARN is already installed -and configured with [CGroups enabled](https://hadoop.apache.org/docs/r3.1.2/hadoop-yarn/hadoop-yarn-site/UsingGpus.html). +and configured with GPU Scheduling and +[CGroups enabled](https://hadoop.apache.org/docs/r3.1.2/hadoop-yarn/hadoop-yarn-site/UsingGpus.html). Enable and configure your [GPUs with MIG](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html) on all of the nodes it applies to. @@ -76,7 +77,3 @@ environment = [ "MIG_AS_GPU_ENABLED=1", "REAL_NVIDIA_SMI_PATH=/if/non-default/p Note, the values for `MIG_AS_GPU_ENABLED`, `REAL_NVIDIA_SMI_PATH`, `ENABLE_NON_MIG_GPUS` should be identical to the ones specified in `yarn-env.sh`. -## Limitations and Caveats -Some metrics are not and cannot be broken down by MIG device. For example, `utilization` is the -aggregate utilization of the parent GPU, and there is no attribution of `temperature` to a -particular MIG device. \ No newline at end of file