Skip to content

Commit

Permalink
update versions for v2204 release (#149)
Browse files Browse the repository at this point in the history
* update versions for v2204 release

Signed-off-by: liyuan <[email protected]>

* rapids version can be different from cudf because of hot fix

Signed-off-by: liyuan <[email protected]>

* since we support csv read some type by default, no longer need deprecated configs

Signed-off-by: liyuan <[email protected]>

* update xgboost4j jars to 1.4.2-0.3.0

Signed-off-by: liyuan <[email protected]>

* revert rapids-4-spark-ml_2.12 version to 22.02

Signed-off-by: liyuan <[email protected]>

* Update examples/Spark-cuML/pca/README.md

Co-authored-by: Allen Xu <[email protected]>

* Update examples/Spark-cuML/pca/README.md

Co-authored-by: Allen Xu <[email protected]>

* Update examples/Spark-cuML/pca/README.md

Co-authored-by: Allen Xu <[email protected]>

Co-authored-by: Allen Xu <[email protected]>
  • Loading branch information
nvliyuan and wjxiz1992 authored Apr 26, 2022
1 parent 88b4f23 commit d2cf00b
Show file tree
Hide file tree
Showing 25 changed files with 69 additions and 83 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ cluster.

- [Databricks 9.1 LTS
ML](https://docs.databricks.com/release-notes/runtime/9.1ml.html#system-environment) has CUDA 11
installed. Users will need to use 22.02.0 or later on Databricks 9.1 LTS ML. In this case use
installed. Users will need to use 21.12.0 or later on Databricks 9.1 LTS ML. In this case use
[generate-init-script.ipynb](generate-init-script.ipynb) which will install
the RAPIDS Spark plugin.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@
"cd ../../dbfs/FileStore/jars/\n",
"sudo wget -O cudf-22.04.0-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/22.04.0/cudf-22.04.0-cuda11.jar\n",
"sudo wget -O rapids-4-spark_2.12-22.04.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.04.0/rapids-4-spark_2.12-22.04.0.jar\n",
"sudo wget -O xgboost4j_3.0-1.4.2-0.2.0.jar https://repo1.maven.org/maven2/com/nvidia/xgboost4j_3.0/1.4.2-0.2.0/xgboost4j_3.0-1.4.2-0.2.0.jar\n",
"sudo wget -O xgboost4j-spark_3.0-1.4.2-0.2.0.jar https://repo1.maven.org/maven2/com/nvidia/xgboost4j-spark_3.0/1.4.2-0.2.0/xgboost4j-spark_3.0-1.4.2-0.2.0.jar\n",
"sudo wget -O xgboost4j_3.0-1.4.2-0.3.0.jar https://repo1.maven.org/maven2/com/nvidia/xgboost4j_3.0/1.4.2-0.3.0/xgboost4j_3.0-1.4.2-0.3.0.jar\n",
"sudo wget -O xgboost4j-spark_3.0-1.4.2-0.3.0.jar https://repo1.maven.org/maven2/com/nvidia/xgboost4j-spark_3.0/1.4.2-0.3.0/xgboost4j-spark_3.0-1.4.2-0.3.0.jar\n",
"ls -ltr\n",
"\n",
"# Your Jars are downloaded in dbfs:/FileStore/jars directory"
Expand Down Expand Up @@ -57,10 +57,10 @@
"source": [
"dbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n",
"#!/bin/bash\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j_3.0-1.4.2-0.2.0.jar /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-gpu_2.12--ml.dmlc__xgboost4j-gpu_2.12__1.5.2.jar\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j_3.0-1.4.2-0.3.0.jar /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-gpu_2.12--ml.dmlc__xgboost4j-gpu_2.12__1.5.2.jar\n",
"sudo cp /dbfs/FileStore/jars/cudf-22.04.0-cuda11.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.04.0.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j-spark_3.0-1.4.2-0.2.0.jar /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.5.2.jar\"\"\", True)"
"sudo cp /dbfs/FileStore/jars/xgboost4j-spark_3.0-1.4.2-0.3.0.jar /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.5.2.jar\"\"\", True)"
]
},
{
Expand Down Expand Up @@ -131,7 +131,7 @@
"\n",
"1. Edit your cluster, adding an initialization script from `dbfs:/databricks/init_scripts/init.sh` in the \"Advanced Options\" under \"Init Scripts\" tab\n",
"2. Reboot the cluster\n",
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark_3.0-1.4.2-0.2.0.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark_3.0-1.4.2-0.3.0.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.04/examples/Spark-ETL+XGBoost/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
"5. Inside the mortgage example notebook, update the data paths\n",
" `train_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-train.csv')`\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@
"source": [
"%sh\n",
"cd ../../dbfs/FileStore/jars/\n",
"sudo wget -O cudf-22.02.0-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/22.02.0/cudf-22.02.0-cuda11.jar\n",
"sudo wget -O rapids-4-spark_2.12-22.02.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.02.0/rapids-4-spark_2.12-22.02.0.jar\n",
"sudo wget -O xgboost4j_3.0-1.4.2-0.2.0.jar https://repo1.maven.org/maven2/com/nvidia/xgboost4j_3.0/1.4.2-0.2.0/xgboost4j_3.0-1.4.2-0.2.0.jar\n",
"sudo wget -O xgboost4j-spark_3.0-1.4.2-0.2.0.jar https://repo1.maven.org/maven2/com/nvidia/xgboost4j-spark_3.0/1.4.2-0.2.0/xgboost4j-spark_3.0-1.4.2-0.2.0.jar\n",
"sudo wget -O cudf-22.04.0-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/22.04.0/cudf-22.04.0-cuda11.jar\n",
"sudo wget -O rapids-4-spark_2.12-22.04.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.04.0/rapids-4-spark_2.12-22.04.0.jar\n",
"sudo wget -O xgboost4j_3.0-1.4.2-0.3.0.jar https://repo1.maven.org/maven2/com/nvidia/xgboost4j_3.0/1.4.2-0.3.0/xgboost4j_3.0-1.4.2-0.3.0.jar\n",
"sudo wget -O xgboost4j-spark_3.0-1.4.2-0.3.0.jar https://repo1.maven.org/maven2/com/nvidia/xgboost4j-spark_3.0/1.4.2-0.3.0/xgboost4j-spark_3.0-1.4.2-0.3.0.jar\n",
"ls -ltr\n",
"\n",
"# Your Jars are downloaded in dbfs:/FileStore/jars directory"
Expand Down Expand Up @@ -57,10 +57,10 @@
"source": [
"dbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n",
"#!/bin/bash\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j_3.0-1.4.2-0.2.0.jar /databricks/jars/spark--maven-trees--ml--9.x--xgboost-gpu--ml.dmlc--xgboost4j-gpu_2.12--ml.dmlc__xgboost4j-gpu_2.12__1.4.1.jar\n",
"sudo cp /dbfs/FileStore/jars/cudf-22.02.0-cuda11.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.02.0.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j-spark_3.0-1.4.2-0.2.0.jar /databricks/jars/spark--maven-trees--ml--9.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.4.1.jar\"\"\", True)"
"sudo cp /dbfs/FileStore/jars/xgboost4j_3.0-1.4.2-0.3.0.jar /databricks/jars/spark--maven-trees--ml--9.x--xgboost-gpu--ml.dmlc--xgboost4j-gpu_2.12--ml.dmlc__xgboost4j-gpu_2.12__1.4.1.jar\n",
"sudo cp /dbfs/FileStore/jars/cudf-22.04.0-cuda11.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.04.0.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j-spark_3.0-1.4.2-0.3.0.jar /databricks/jars/spark--maven-trees--ml--9.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.4.1.jar\"\"\", True)"
]
},
{
Expand Down Expand Up @@ -131,8 +131,8 @@
"\n",
"1. Edit your cluster, adding an initialization script from `dbfs:/databricks/init_scripts/init.sh` in the \"Advanced Options\" under \"Init Scripts\" tab\n",
"2. Reboot the cluster\n",
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark_3.0-1.4.2-0.2.0.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.02/examples/Spark-ETL+XGBoost/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark_3.0-1.4.2-0.3.0.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.04/examples/Spark-ETL+XGBoost/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
"5. Inside the mortgage example notebook, update the data paths\n",
" `train_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-train.csv')`\n",
" `trans_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-trans.csv')`"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ export SPARK_DOCKER_IMAGE=<gpu spark docker image repo and name>
export SPARK_DOCKER_TAG=<spark docker image tag>

pushd ${SPARK_HOME}
wget https://github.com/NVIDIA/spark-rapids-examples/raw/branch-22.02/dockerfile/Dockerfile
wget https://github.com/NVIDIA/spark-rapids-examples/raw/branch-22.04/dockerfile/Dockerfile

# Optionally install additional jars into ${SPARK_HOME}/jars/

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,6 @@ ${SPARK_HOME}/bin/spark-submit \
--conf spark.task.resource.gpu.amount=1 \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.rapids.sql.incompatibleDateFormats.enabled=true \
--conf spark.rapids.sql.csv.read.integer.enabled=true \
--conf spark.rapids.sql.csv.read.long.enabled=true \
--conf spark.rapids.sql.csv.read.double.enabled=true \
--py-files ${SAMPLE_ZIP} \
main.py \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,6 @@ ${SPARK_HOME}/bin/spark-submit \
--conf spark.task.resource.gpu.amount=1 \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.rapids.sql.incompatibleDateFormats.enabled=true \
--conf spark.rapids.sql.csv.read.integer.enabled=true \
--conf spark.rapids.sql.csv.read.long.enabled=true \
--conf spark.rapids.sql.csv.read.double.enabled=true \
--class com.nvidia.spark.examples.mortgage.ETLMain \
$SAMPLE_JAR \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,15 @@ For simplicity export the location to these jars. All examples assume the packag
### Download the jars

1. Download the XGBoost for Apache Spark jars
* [XGBoost4j Package](https://repo1.maven.org/maven2/com/nvidia/xgboost4j_3.0/1.4.2-0.2.0/)
* [XGBoost4j-Spark Package](https://repo1.maven.org/maven2/com/nvidia/xgboost4j-spark_3.0/1.4.2-0.2.0/)
* [XGBoost4j Package](https://repo1.maven.org/maven2/com/nvidia/xgboost4j_3.0/1.4.2-0.3.0/)
* [XGBoost4j-Spark Package](https://repo1.maven.org/maven2/com/nvidia/xgboost4j-spark_3.0/1.4.2-0.3.0/)

2. Download the RAPIDS Accelerator for Apache Spark plugin jar
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.02.0/rapids-4-spark_2.12-22.02.0.jar)
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.04.0/rapids-4-spark_2.12-22.04.0.jar)

Then download the version of the cudf jar that your version of the accelerator depends on.

* [cuDF Package](https://repo1.maven.org/maven2/ai/rapids/cudf/22.02.0/cudf-22.02.0-cuda11.jar)
* [cuDF Package](https://repo1.maven.org/maven2/ai/rapids/cudf/22.04.0/cudf-22.04.0-cuda11.jar)

### Build XGBoost Python Examples

Expand All @@ -29,10 +29,10 @@ You need to download Mortgage dataset to `/opt/xgboost` from this [site](https:/

``` bash
export SPARK_XGBOOST_DIR=/opt/xgboost
export CUDF_JAR=${SPARK_XGBOOST_DIR}/cudf-22.02.0-cuda11.jar
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-22.02.0.jar
export XGBOOST4J_JAR=${SPARK_XGBOOST_DIR}/xgboost4j_3.0-1.4.2-0.2.0.jar
export XGBOOST4J_SPARK_JAR=${SPARK_XGBOOST_DIR}/xgboost4j-spark_3.0-1.4.2-0.2.0.jar
export CUDF_JAR=${SPARK_XGBOOST_DIR}/cudf-22.04.0-cuda11.jar
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-22.04.0.jar
export XGBOOST4J_JAR=${SPARK_XGBOOST_DIR}/xgboost4j_3.0-1.4.2-0.3.0.jar
export XGBOOST4J_SPARK_JAR=${SPARK_XGBOOST_DIR}/xgboost4j-spark_3.0-1.4.2-0.3.0.jar
export SAMPLE_ZIP=${SPARK_XGBOOST_DIR}/samples.zip
export MAIN_PY=${SPARK_XGBOOST_DIR}/main.py
```
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ For simplicity export the location to these jars. All examples assume the packag
### Download the jars

1. Download the RAPIDS Accelerator for Apache Spark plugin jar
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.02.0/rapids-4-spark_2.12-22.02.0.jar)
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.04.0/rapids-4-spark_2.12-22.04.0.jar)

Then download the version of the cudf jar that your version of the accelerator depends on.

* [cuDF Package](https://repo1.maven.org/maven2/ai/rapids/cudf/22.02.0/cudf-22.02.0-cuda11.jar)
* [cuDF Package](https://repo1.maven.org/maven2/ai/rapids/cudf/22.04.0/cudf-22.04.0-cuda11.jar)

### Build XGBoost Scala Examples

Expand All @@ -25,7 +25,7 @@ You need to download mortgage dataset to `/opt/xgboost` from this [site](https:/

``` bash
export SPARK_XGBOOST_DIR=/opt/xgboost
export CUDF_JAR=${SPARK_XGBOOST_DIR}/cudf-22.02.0-cuda11.jar
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-22.02.0.jar
export CUDF_JAR=${SPARK_XGBOOST_DIR}/cudf-22.04.0-cuda11.jar
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-22.04.0.jar
export SAMPLE_JAR=${SPARK_XGBOOST_DIR}/sample_xgboost_apps-0.2.2-jar-with-dependencies.jar
```
4 changes: 2 additions & 2 deletions examples/RAPIDS-accelerated-UDFs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,8 @@ See above Prerequisites section
First finish the steps in "Building with Native Code Examples and run test cases" section, then do the following in the docker.

### Get jars from Maven Central
[cudf-22.02.0-cuda11.jar](https://repo1.maven.org/maven2/ai/rapids/cudf/22.02.0/cudf-22.02.0-cuda11.jar)
[rapids-4-spark_2.12-22.02.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.02.0/rapids-4-spark_2.12-22.02.0.jar)
[cudf-22.04.0-cuda11.jar](https://repo1.maven.org/maven2/ai/rapids/cudf/22.04.0/cudf-22.04.0-cuda11.jar)
[rapids-4-spark_2.12-22.04.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.04.0/rapids-4-spark_2.12-22.04.0.jar)

### Launch a local mode Spark

Expand Down
4 changes: 2 additions & 2 deletions examples/RAPIDS-accelerated-UDFs/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,9 @@
<cuda.version>cuda11</cuda.version>
<scala.binary.version>2.12</scala.binary.version>
<!-- Update when releasing new version -->
<cudf.version>22.02.0</cudf.version>
<cudf.version>22.04.0</cudf.version>
<!-- Depends on release version, Snapshot version is not published to the Maven Central -->
<rapids4spark.version>22.02.0</rapids4spark.version>
<rapids4spark.version>22.04.0</rapids4spark.version>
<spark.version>3.1.1</spark.version>
<scala.version>2.12.15</scala.version>
<udf.native.build.path>${project.build.directory}/cpp-build</udf.native.build.path>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,13 @@
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>xgboost4j_3.0</artifactId>
<version>1.4.2-0.2.0</version>
<version>1.4.2-0.3.0</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>xgboost4j-spark_3.0</artifactId>
<version>1.4.2-0.2.0</version>
<version>1.4.2-0.3.0</version>
<scope>compile</scope>
</dependency>
<dependency>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -624,9 +624,7 @@
"spark.conf.set(\"spark.rapids.sql.incompatibleDateFormats.enabled\", \"true\")\n",
"spark.conf.set(\"spark.rapids.sql.hasNans\", \"false\")\n",
"# use GPU to read CSV\n",
"spark.conf.set(\"spark.rapids.sql.csv.read.long.enabled\", \"true\")\n",
"spark.conf.set(\"spark.rapids.sql.csv.read.double.enabled\", \"true\")\n",
"spark.conf.set(\"spark.rapids.sql.csv.read.integer.enabled\", \"true\")"
"spark.conf.set(\"spark.rapids.sql.csv.read.double.enabled\", \"true\")"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,16 @@
"All data could be found at https://docs.rapids.ai/datasets/mortgage-data\n",
"\n",
"### 2. Download needed jars\n",
"* [cudf-22.02.0-cuda11.jar](https://repo1.maven.org/maven2/ai/rapids/cudf/22.02.0/)\n",
"* [rapids-4-spark_2.12-22.02.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.02.0/rapids-4-spark_2.12-22.02.0.jar)\n",
"* [cudf-22.04.0-cuda11.jar](https://repo1.maven.org/maven2/ai/rapids/cudf/22.04.0/)\n",
"* [rapids-4-spark_2.12-22.04.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.04.0/rapids-4-spark_2.12-22.04.0.jar)\n",
"\n",
"\n",
"### 3. Start Spark Standalone\n",
"Before running the script, please setup Spark standalone mode\n",
"\n",
"### 4. Add ENV\n",
"```\n",
"$ export SPARK_JARS=cudf-22.02.0-cuda11.jar,rapids-4-spark_2.12-22.02.0.jar\n",
"$ export SPARK_JARS=cudf-22.04.0-cuda11.jar,rapids-4-spark_2.12-22.04.0.jar\n",
"$ export PYSPARK_DRIVER_PYTHON=jupyter \n",
"$ export PYSPARK_DRIVER_PYTHON_OPTS=notebook\n",
"```\n",
Expand All @@ -30,8 +30,6 @@
"--jars ${SPARK_JARS} \\\n",
"--conf spark.plugins=com.nvidia.spark.SQLPlugin \\\n",
"--conf spark.rapids.sql.incompatibleDateFormats.enabled=true \\\n",
"--conf spark.rapids.sql.csv.read.integer.enabled=true \\\n",
"--conf spark.rapids.sql.csv.read.long.enabled=true \\\n",
"--conf spark.rapids.sql.csv.read.double.enabled=true \\\n",
"--py-files ${SPARK_PY_FILES}\n",
"```\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@
"All data could be found at https://docs.rapids.ai/datasets/mortgage-data\n",
"\n",
"### 2. Download needed jars\n",
"* [cudf-22.02.0-cuda11.jar](https://repo1.maven.org/maven2/ai/rapids/cudf/22.02.0/)\n",
"* [rapids-4-spark_2.12-22.02.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.02.0/rapids-4-spark_2.12-22.02.0.jar)\n",
"* [cudf-22.04.0-cuda11.jar](https://repo1.maven.org/maven2/ai/rapids/cudf/22.04.0/)\n",
"* [rapids-4-spark_2.12-22.04.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.04.0/rapids-4-spark_2.12-22.04.0.jar)\n",
"\n",
"### 3. Start Spark Standalone\n",
"Before Running the script, please setup Spark standalone mode\n",
"\n",
"### 4. Add ENV\n",
"```\n",
"$ export SPARK_JARS=cudf-22.02.0-cuda11.jar,rapids-4-spark_2.12-22.02.0.jar\n",
"$ export SPARK_JARS=cudf-22.04.0-cuda11.jar,rapids-4-spark_2.12-22.04.0.jar\n",
"\n",
"```\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -160,10 +160,10 @@
"```scala\n",
"import org.apache.spark.sql.SparkSession\n",
"val spark = SparkSession.builder().appName(\"Taxi-GPU\").getOrCreate\n",
"%AddJar file:/data/libs/cudf-22.02.0-cuda11.jar\n",
"%AddJar file:/data/libs/xgboost4j_3.0-1.4.2-0.2.0.jar\n",
"%AddJar file:/data/libs/xgboost4j-spark_3.0-1.4.2-0.2.0.jar\n",
"%AddJar file:/data/libs/rapids-4-spark_2.12-22.02.0.jar\n",
"%AddJar file:/data/libs/cudf-22.04.0-cuda11.jar\n",
"%AddJar file:/data/libs/xgboost4j_3.0-1.4.2-0.3.0.jar\n",
"%AddJar file:/data/libs/xgboost4j-spark_3.0-1.4.2-0.3.0.jar\n",
"%AddJar file:/data/libs/rapids-4-spark_2.12-22.04.0.jar\n",
"// ...\n",
"```"
]
Expand Down
2 changes: 1 addition & 1 deletion examples/Spark-ETL+XGBoost/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@

<properties>
<encoding>UTF-8</encoding>
<xgboost.version>1.4.2-0.2.0</xgboost.version>
<xgboost.version>1.4.2-0.3.0</xgboost.version>
<spark.version>3.1.1</spark.version>
<scala.version>2.12.8</scala.version>
<scala.binary.version>2.12</scala.binary.version>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@
"All data could be found at https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page\n",
"\n",
"### 2. Download needed jars\n",
"* [cudf-22.02.0-cuda11.jar](https://repo1.maven.org/maven2/ai/rapids/cudf/22.02.0/)\n",
"* [rapids-4-spark_2.12-22.02.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.02.0/rapids-4-spark_2.12-22.02.0.jar)\n",
"* [cudf-22.04.0-cuda11.jar](https://repo1.maven.org/maven2/ai/rapids/cudf/22.04.0/)\n",
"* [rapids-4-spark_2.12-22.04.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.04.0/rapids-4-spark_2.12-22.04.0.jar)\n",
"\n",
"### 3. Start Spark Standalone\n",
"Before running the script, please setup Spark standalone mode\n",
"\n",
"### 4. Add ENV\n",
"```\n",
"$ export SPARK_JARS=cudf-22.02.0-cuda11.jar,rapids-4-spark_2.12-22.02.0.jar\n",
"$ export SPARK_JARS=cudf-22.04.0-cuda11.jar,rapids-4-spark_2.12-22.04.0.jar\n",
"$ export PYSPARK_DRIVER_PYTHON=jupyter \n",
"$ export PYSPARK_DRIVER_PYTHON_OPTS=notebook\n",
"```\n",
Expand All @@ -39,8 +39,6 @@
"--jars ${SPARK_JARS} \\\n",
"--conf spark.plugins=com.nvidia.spark.SQLPlugin \\\n",
"--conf spark.rapids.sql.incompatibleDateFormats.enabled=true \\\n",
"--conf spark.rapids.sql.csv.read.integer.enabled=true \\\n",
"--conf spark.rapids.sql.csv.read.long.enabled=true \\\n",
"--conf spark.rapids.sql.csv.read.double.enabled=true \\\n",
"--py-files ${SPARK_PY_FILES}\n",
"```\n",
Expand Down
Loading

0 comments on commit d2cf00b

Please sign in to comment.