User tools to help with the adoption, installation, execution, and tuning of RAPIDS Accelerator for Apache Spark.
The wrapper improves end-user experience within the following dimensions:
- Qualification: Educate the CPU customer on the cost savings and acceleration potential of RAPIDS Accelerator for Apache Spark. The output shows a list of apps recommended for RAPIDS Accelerator for Apache Spark with estimated savings and speed-up.
- Bootstrap: Provide optimized RAPIDS Accelerator for Apache Spark configs based on GPU cluster shape. The output shows updated Spark config settings on driver node.
- Tuning: Tune RAPIDS Accelerator for Apache Spark configs based on initial job run leveraging Spark event logs. The output shows recommended per-app RAPIDS Accelerator for Apache Spark config settings.
- Diagnostics: Run diagnostic functions to validate the Dataproc with RAPIDS Accelerator for Apache Spark environment to make sure the cluster is healthy and ready for Spark jobs.
Set up a Python environment with a version between 3.8 and 3.10
-
Run the project in a virtual environment.
$ python -m venv .venv $ source .venv/bin/activate
-
Install spark-rapids-user-tools
-
Using released package.
$ pip install spark-rapids-user-tools
-
Install from source.
$ pip install -e .
Note that you can also use optional
test
to install dependencies required to run the unit-testspip install -e '.[test]'
-
Using wheel package built from the repo (see the build steps below).
$ pip install <wheel-file>
-
-
Make sure to install CSP SDK if you plan to run the tool wrapper.
Set up a Python environment similar to the steps above.
-
Run the provided build script to compile the project.
$> ./build.sh
-
Fat Mode: Similar to
fat jar
in Java, this mode solves the problem when web access is not available to download resources having Url-paths (http/https).
The command builds the tools jar file and downloads the necessary dependencies and packages them with the source code into a single 'wheel' file.$> ./build.sh fat
Please refer to spark-rapids-user-tools guide for details on how to use the tools and the platform.
Please refer to CHANGELOG.md for our latest changes.