forked from opensearch-project/opensearch-spark
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
update spark-docker example with iceberg tables
update documentation Signed-off-by: YANGDB <[email protected]>
- Loading branch information
Showing
8 changed files
with
103 additions
and
46 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
MASTER_UI_PORT=8080 | ||
MASTER_PORT=7077 | ||
UI_PORT=4040 | ||
PPL_JAR=../../ppl-spark-integration/target/scala-2.12/ppl-spark-integration-assembly-0.7.0-SNAPSHOT.jar | ||
PPL_JAR=../../sparkPPLCosmetic/target/scala-2.12/opensearch-spark-ppl-assembly-0.7.0-SNAPSHOT.jar |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
FROM bitnami/spark:3.5.3 | ||
|
||
# Install wget | ||
USER root | ||
RUN apt-get update && apt-get install -y wget && rm -rf /var/lib/apt/lists/* | ||
|
||
# Define the Iceberg version and Maven repository URL | ||
ENV ICEBERG_VERSION=1.5.0 | ||
ENV MAVEN_REPO=https://repo1.maven.org/maven2 | ||
|
||
# Download the Iceberg runtime JAR | ||
RUN wget $MAVEN_REPO/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/$ICEBERG_VERSION/iceberg-spark-runtime-3.5_2.12-$ICEBERG_VERSION.jar \ | ||
-O /opt/bitnami/spark/jars/iceberg-spark-runtime-3.5.jar | ||
|
||
# Optional: Add configuration files | ||
COPY spark-defaults.conf /opt/bitnami/spark/conf/ | ||
|
||
# Set up environment variables for Spark | ||
ENV SPARK_MODE=master | ||
ENV SPARK_RPC_AUTHENTICATION_ENABLED=no | ||
ENV SPARK_RPC_ENCRYPTION_ENABLED=no | ||
ENV SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no | ||
ENV SPARK_SSL_ENABLED=no | ||
|
||
# Switch back to non-root user for security | ||
USER 1001 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Sanity Test OpenSearch Spark PPL | ||
This document shows how to locally test OpenSearch PPL commands on top of Spark using docker-compose. | ||
|
||
See instructions for running docker-compose [here](../../docs/spark-docker.md) | ||
|
||
Once the docker services are running,[connect to the spark-sql](../../docs/local-spark-ppl-test-instruction.md#running-spark-shell) | ||
|
||
In the spark-sql shell - [run the next create table statements](../../docs/local-spark-ppl-test-instruction.md#testing-ppl-commands) | ||
|
||
Now PPL commands can [run](../../docs/local-spark-ppl-test-instruction.md#test-grok--top-commands-combination) on top of the table just created | ||
|
||
### Using Iceberg Tables | ||
The following example utilize https://iceberg.apache.org/ table as an example | ||
```sql | ||
CREATE TABLE iceberg_table ( | ||
id INT, | ||
name STRING, | ||
age INT, | ||
city STRING | ||
) | ||
USING iceberg | ||
PARTITIONED BY (city) | ||
LOCATION 'file:/tmp/iceberg-tables/default/iceberg_table'; | ||
|
||
INSERT INTO iceberg_table VALUES | ||
(1, 'Alice', 30, 'New York'), | ||
(2, 'Bob', 25, 'San Francisco'), | ||
(3, 'Charlie', 35, 'New York'), | ||
(4, 'David', 40, 'Chicago'), | ||
(5, 'Eve', 28, 'San Francisco'); | ||
``` | ||
|
||
### PPL queries | ||
```sql | ||
source=`default`.`iceberg_table`; | ||
source=`default`.`iceberg_table` | where age > 30 | fields id, name, age, city | sort - age; | ||
source=`default`.`iceberg_table` | where age > 30 | stats count() by city; | ||
source=`default`.`iceberg_table` | stats avg(age) by city; | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
57 changes: 20 additions & 37 deletions
57
...-lang/local-spark-ppl-test-instruction.md → docs/local-spark-ppl-test-instruction.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters