Skip to content

Commit

Permalink
Merge branch 'main' into ppl-tablesample-feature
Browse files Browse the repository at this point in the history
# Conflicts:
#	docs/ppl-lang/README.md
  • Loading branch information
YANG-DB committed Oct 22, 2024
2 parents d2e3bd6 + 0b6da30 commit a0e4b5b
Show file tree
Hide file tree
Showing 12 changed files with 376 additions and 25 deletions.
9 changes: 9 additions & 0 deletions docs/ppl-lang/PPL-Example-Commands.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
## Example PPL Queries

#### **Comment**
[See additional command details](ppl-comment.md)
- `source=accounts | top gender // finds most common gender of all the accounts` (line comment)
- `source=accounts | dedup 2 gender /* dedup the document with gender field keep 2 duplication */ | fields account_number, gender` (block comment)

#### **Describe**
- `describe table` This command is equal to the `DESCRIBE EXTENDED table` SQL command
- `describe schema.table`
Expand Down Expand Up @@ -92,6 +97,10 @@ Assumptions: `a`, `b`, `c` are existing fields in `table`
- `source = table | eval f = case(a = 0, 'zero', a = 1, 'one', a = 2, 'two', a = 3, 'three', a = 4, 'four', a = 5, 'five', a = 6, 'six', a = 7, 'se7en', a = 8, 'eight', a = 9, 'nine')`
- `source = table | eval f = case(a = 0, 'zero', a = 1, 'one' else 'unknown')`
- `source = table | eval f = case(a = 0, 'zero', a = 1, 'one' else concat(a, ' is an incorrect binary digit'))`
- `source = table | eval digest = md5(fieldName) | fields digest`
- `source = table | eval digest = sha1(fieldName) | fields digest`
- `source = table | eval digest = sha2(fieldName,256) | fields digest`
- `source = table | eval digest = sha2(fieldName,512) | fields digest`

#### Fillnull
Assumptions: `a`, `b`, `c`, `d`, `e` are existing fields in `table`
Expand Down
6 changes: 5 additions & 1 deletion docs/ppl-lang/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ For additional examples see the next [documentation](PPL-Example-Commands.md).


* **Commands**


- [`comment`](ppl-comment.md)

- [`explain command `](PPL-Example-Commands.md/#explain)

- [`dedup command `](ppl-dedup-command.md)
Expand Down Expand Up @@ -81,6 +83,8 @@ For additional examples see the next [documentation](PPL-Example-Commands.md).

- [`Type Conversion Functions`](functions/ppl-conversion.md)

- [`Cryptographic Functions`](functions/ppl-cryptographic.md)


---
### PPL On Spark
Expand Down
77 changes: 77 additions & 0 deletions docs/ppl-lang/functions/ppl-cryptographic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
## PPL Cryptographic Functions

### `MD5`

**Description**

Calculates the MD5 digest and returns the value as a 32 character hex string.

Usage: `md5('hello')`

**Argument type:**
- STRING
- Return type: **STRING**

Example:

os> source=people | eval `MD5('hello')` = MD5('hello') | fields `MD5('hello')`
fetched rows / total rows = 1/1
+----------------------------------+
| MD5('hello') |
|----------------------------------|
| 5d41402abc4b2a76b9719d911017c592 |
+----------------------------------+

### `SHA1`

**Description**

Returns the hex string result of SHA-1

Usage: `sha1('hello')`

**Argument type:**
- STRING
- Return type: **STRING**

Example:

os> source=people | eval `SHA1('hello')` = SHA1('hello') | fields `SHA1('hello')`
fetched rows / total rows = 1/1
+------------------------------------------+
| SHA1('hello') |
|------------------------------------------|
| aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d |
+------------------------------------------+

### `SHA2`

**Description**

Returns the hex string result of SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). The numBits indicates the desired bit length of the result, which must have a value of 224, 256, 384, 512

Usage: `sha2('hello',256)`

Usage: `sha2('hello',512)`

**Argument type:**
- STRING, INTEGER
- Return type: **STRING**

Example:

os> source=people | eval `SHA2('hello',256)` = SHA2('hello',256) | fields `SHA2('hello',256)`
fetched rows / total rows = 1/1
+------------------------------------------------------------------+
| SHA2('hello',256) |
|------------------------------------------------------------------|
| 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 |
+------------------------------------------------------------------+

os> source=people | eval `SHA2('hello',512)` = SHA2('hello',512) | fields `SHA2('hello',512)`
fetched rows / total rows = 1/1
+----------------------------------------------------------------------------------------------------------------------------------+
| SHA2('hello',512) |
|----------------------------------------------------------------------------------------------------------------------------------|
| 9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca72323c3d99ba5c11d7c7acc6e14b8c5da0c4663475c2e5c3adef46f73bcdec043 |
+----------------------------------------------------------------------------------------------------------------------------------+
34 changes: 34 additions & 0 deletions docs/ppl-lang/ppl-comment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
## Comments

Comments are not evaluated texts. PPL supports both line comments and block comments.

### Line Comments

Line comments begin with two slashes `//` and end with a new line.

Example::

os> source=accounts | top gender // finds most common gender of all the accounts
fetched rows / total rows = 2/2
+----------+
| gender |
|----------|
| M |
| F |
+----------+

### Block Comments

Block comments begin with a slash followed by an asterisk `\*` and end with an asterisk followed by a slash `*/`.

Example::

os> source=accounts | dedup 2 gender /* dedup the document with gender field keep 2 duplication */ | fields account_number, gender
fetched rows / total rows = 3/3
+------------------+----------+
| account_number | gender |
|------------------+----------|
| 1 | M |
| 6 | M |
| 13 | F |
+------------------+----------+
Original file line number Diff line number Diff line change
Expand Up @@ -785,6 +785,42 @@ class FlintSparkPPLBuiltinFunctionITSuite
assert(results.sameElements(expectedResults))
}

test("test cryptographic hash functions - md5") {
val frame = sql(s"""
| source = $testTable | eval a = md5('Spark') = '8cde774d6f7333752ed72cacddb05126' | fields age, a
| """.stripMargin)

val results: Array[Row] = frame.collect()
val expectedResults: Array[Row] =
Array(Row(70, true), Row(30, true), Row(25, true), Row(20, true))
implicit val rowOrdering: Ordering[Row] = Ordering.by[Row, Integer](_.getAs[Integer](0))
assert(results.sorted.sameElements(expectedResults.sorted))
}

test("test cryptographic hash functions - sha1") {
val frame = sql(s"""
| source = $testTable | eval a = sha1('Spark') = '85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c' | fields age, a
| """.stripMargin)

val results: Array[Row] = frame.collect()
val expectedResults: Array[Row] =
Array(Row(70, true), Row(30, true), Row(25, true), Row(20, true))
implicit val rowOrdering: Ordering[Row] = Ordering.by[Row, Integer](_.getAs[Integer](0))
assert(results.sorted.sameElements(expectedResults.sorted))
}

test("test cryptographic hash functions - sha2") {
val frame = sql(s"""
| source = $testTable | eval a = sha2('Spark',256) = '529bc3b07127ecb7e53a4dcf1991d9152c24537d919178022b2c42657f79a26b' | fields age, a
| """.stripMargin)

val results: Array[Row] = frame.collect()
val expectedResults: Array[Row] =
Array(Row(70, true), Row(30, true), Row(25, true), Row(20, true))
implicit val rowOrdering: Ordering[Row] = Ordering.by[Row, Integer](_.getAs[Integer](0))
assert(results.sorted.sameElements(expectedResults.sorted))
}

// Todo
// +---------------------------------------+
// | Below tests are not supported (cast) |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.flint.spark.ppl

import org.opensearch.sql.ppl.utils.DataTypeTransformer.seq

import org.apache.spark.sql.{AnalysisException, QueryTest, Row}
import org.apache.spark.sql.catalyst.analysis.{UnresolvedAttribute, UnresolvedFunction, UnresolvedRelation, UnresolvedStar}
import org.apache.spark.sql.catalyst.expressions.{Alias, And, Ascending, CaseWhen, Descending, EqualTo, GreaterThanOrEqual, LessThan, Literal, SortOrder}
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.streaming.StreamTest

class FlintSparkPPLCommentITSuite
extends QueryTest
with LogicalPlanTestUtils
with FlintPPLSuite
with StreamTest {

private val testTable = "spark_catalog.default.flint_ppl_test"

override def beforeAll(): Unit = {
super.beforeAll()

createPartitionedStateCountryTable(testTable)
}

protected override def afterEach(): Unit = {
super.afterEach()
spark.streams.active.foreach { job =>
job.stop()
job.awaitTermination()
}
}

test("test line comment") {
val frame = sql(s"""
| /*
| * This is a
| * multiple
| * line block
| * comment
| */
| source = /* block comment */ $testTable /* block comment */
| | eval /*
| This is a
| multiple
| line
| block
| comment
| */ col = 1
| | /* block comment */ fields name, /* block comment */ age
| /* block comment */
| """.stripMargin)

val results: Array[Row] = frame.collect()
val expectedResults: Array[Row] =
Array(Row("Jake", 70), Row("Hello", 30), Row("John", 25), Row("Jane", 20))
implicit val rowOrdering: Ordering[Row] = Ordering.by[Row, String](_.getAs[String](0))
assert(results.sorted.sameElements(expectedResults.sorted))
}

test("test block comment") {
val frame = sql(s"""
| source = $testTable //line comment
| | eval col = 1 // line comment
| | fields name, age // line comment
| /////////line comment
| """.stripMargin)

val results: Array[Row] = frame.collect()
val expectedResults: Array[Row] =
Array(Row("Jake", 70), Row("Hello", 30), Row("John", 25), Row("Jane", 20))
implicit val rowOrdering: Ordering[Row] = Ordering.by[Row, String](_.getAs[String](0))
assert(results.sorted.sameElements(expectedResults.sorted))
}
}
7 changes: 7 additions & 0 deletions ppl-spark-integration/src/main/antlr4/OpenSearchPPLLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,11 @@ RADIANS: 'RADIANS';
SIN: 'SIN';
TAN: 'TAN';

// CRYPTOGRAPHIC FUNCTIONS
MD5: 'MD5';
SHA1: 'SHA1';
SHA2: 'SHA2';

// DATE AND TIME FUNCTIONS
ADDDATE: 'ADDDATE';
ADDTIME: 'ADDTIME';
Expand Down Expand Up @@ -443,5 +448,7 @@ SQUOTA_STRING: '\'' ('\\'. | '\'\'' | ~('\'' | '\\'))* '\''
BQUOTA_STRING: '`' ( '\\'. | '``' | ~('`'|'\\'))* '`';
fragment DEC_DIGIT: [0-9];

LINE_COMMENT: '//' ('\\\n' | ~[\r\n])* '\r'? '\n'? -> channel(HIDDEN);
BLOCK_COMMENT: '/*' .*? '*/' -> channel(HIDDEN);

ERROR_RECOGNITION: . -> channel(ERRORCHANNEL);
8 changes: 8 additions & 0 deletions ppl-spark-integration/src/main/antlr4/OpenSearchPPLParser.g4
Original file line number Diff line number Diff line change
Expand Up @@ -512,6 +512,7 @@ evalFunctionName
| systemFunctionName
| positionFunctionName
| coalesceFunctionName
| cryptographicFunctionName
;

functionArgs
Expand Down Expand Up @@ -627,6 +628,12 @@ trigonometricFunctionName
| TAN
;

cryptographicFunctionName
: MD5
| SHA1
| SHA2
;

dateTimeFunctionName
: ADDDATE
| ADDTIME
Expand Down Expand Up @@ -958,6 +965,7 @@ keywordsCanBeId
| textFunctionName
| mathematicalFunctionName
| positionFunctionName
| cryptographicFunctionName
// commands
| SEARCH
| DESCRIBE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,11 @@ public enum BuiltinFunctionName {
SIN(FunctionName.of("sin")),
TAN(FunctionName.of("tan")),

/** Cryptographic Functions. */
MD5(FunctionName.of("md5")),
SHA1(FunctionName.of("sha1")),
SHA2(FunctionName.of("sha2")),

/** Date and Time Functions. */
ADDDATE(FunctionName.of("adddate")),
// ADDTIME(FunctionName.of("addtime")),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,30 +13,7 @@
import java.util.List;
import java.util.Map;

import static org.opensearch.sql.expression.function.BuiltinFunctionName.ADD;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.ADDDATE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DATEDIFF;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DAY_OF_MONTH;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.COALESCE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.SUBTRACT;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.MULTIPLY;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DIVIDE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.MODULUS;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DAY_OF_WEEK;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DAY_OF_YEAR;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.HOUR_OF_DAY;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.IS_NOT_NULL;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.IS_NULL;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.LENGTH;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.LOCALTIME;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.MINUTE_OF_HOUR;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.MONTH_OF_YEAR;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.SECOND_OF_MINUTE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.SUBDATE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.SYSDATE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.TRIM;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.WEEK;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.WEEK_OF_YEAR;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.*;
import static org.opensearch.sql.ppl.utils.DataTypeTransformer.seq;
import static scala.Option.empty;

Expand Down Expand Up @@ -68,6 +45,10 @@ public interface BuiltinFunctionTranslator {
.put(DATEDIFF, "datediff")
.put(LOCALTIME, "localtimestamp")
.put(SYSDATE, "now")
// Cryptographic functions
.put(MD5, "md5")
.put(SHA1, "sha1")
.put(SHA2, "sha2")
// condition functions
.put(IS_NULL, "isnull")
.put(IS_NOT_NULL, "isnotnull")
Expand Down
Loading

0 comments on commit a0e4b5b

Please sign in to comment.