Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Cryptographic hash functions #788

Merged
merged 6 commits into from
Oct 22, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/ppl-lang/PPL-Example-Commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,10 @@ Assumptions: `a`, `b`, `c` are existing fields in `table`
- `source = table | eval f = case(a = 0, 'zero', a = 1, 'one', a = 2, 'two', a = 3, 'three', a = 4, 'four', a = 5, 'five', a = 6, 'six', a = 7, 'se7en', a = 8, 'eight', a = 9, 'nine')`
- `source = table | eval f = case(a = 0, 'zero', a = 1, 'one' else 'unknown')`
- `source = table | eval f = case(a = 0, 'zero', a = 1, 'one' else concat(a, ' is an incorrect binary digit'))`
- `source = table | eval digest = md5(fieldName) | fields digest`
- `source = table | eval digest = sha1(fieldName) | fields digest`
- `source = table | eval digest = sha2(fieldName,256) | fields digest`
- `source = table | eval digest = sha2(fieldName,512) | fields digest`

#### Fillnull
Assumptions: `a`, `b`, `c`, `d`, `e` are existing fields in `table`
Expand Down
2 changes: 2 additions & 0 deletions docs/ppl-lang/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ For additional examples see the next [documentation](PPL-Example-Commands.md).

- [`Type Conversion Functions`](functions/ppl-conversion.md)

- [`Cryptographic Functions`](functions/ppl-cryptographic.md)


---
### PPL On Spark
Expand Down
77 changes: 77 additions & 0 deletions docs/ppl-lang/functions/ppl-cryptographic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
## PPL Cryptographic Functions

### `MD5`

**Description**

Calculates the MD5 digest and returns the value as a 32 character hex string.

Usage: `md5('hello')`

**Argument type:**
- STRING
- Return type: **STRING**

Example:

os> source=people | eval `MD5('hello')` = MD5('hello') | fields `MD5('hello')`
fetched rows / total rows = 1/1
+----------------------------------+
| MD5('hello') |
|----------------------------------|
| 5d41402abc4b2a76b9719d911017c592 |
+----------------------------------+

### `SHA1`

**Description**

Returns the hex string result of SHA-1

Usage: `sha1('hello')`

**Argument type:**
- STRING
- Return type: **STRING**

Example:

os> source=people | eval `SHA1('hello')` = SHA1('hello') | fields `SHA1('hello')`
fetched rows / total rows = 1/1
+------------------------------------------+
| SHA1('hello') |
|------------------------------------------|
| aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d |
+------------------------------------------+

### `SHA2`

**Description**

Returns the hex string result of SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). The numBits indicates the desired bit length of the result, which must have a value of 224, 256, 384, 512

Usage: `sha2('hello',256)`

Usage: `sha2('hello',512)`

**Argument type:**
- STRING, INTEGER
- Return type: **STRING**

Example:

os> source=people | eval `SHA2('hello',256)` = SHA2('hello',256) | fields `SHA2('hello',256)`
fetched rows / total rows = 1/1
+------------------------------------------------------------------+
| SHA2('hello',256) |
|------------------------------------------------------------------|
| 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 |
+------------------------------------------------------------------+

os> source=people | eval `SHA2('hello',512)` = SHA2('hello',512) | fields `SHA2('hello',512)`
fetched rows / total rows = 1/1
+----------------------------------------------------------------------------------------------------------------------------------+
| SHA2('hello',512) |
|----------------------------------------------------------------------------------------------------------------------------------|
| 9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca72323c3d99ba5c11d7c7acc6e14b8c5da0c4663475c2e5c3adef46f73bcdec043 |
+----------------------------------------------------------------------------------------------------------------------------------+
Original file line number Diff line number Diff line change
Expand Up @@ -785,6 +785,75 @@ class FlintSparkPPLBuiltinFunctionITSuite
assert(results.sameElements(expectedResults))
}

test("test cryptographic hash functions - md5") {
val frame = sql(s"""
| source = $testTable digest=md5('Spark') | fields digest
| """.stripMargin)

val results: Array[Row] = frame.collect()
val expectedResults: Array[Row] = Array(Row("8cde774d6f7333752ed72cacddb05126"))
assert(results.sameElements(expectedResults))

val logicalPlan: LogicalPlan = frame.queryExecution.logical
val table = UnresolvedRelation(Seq("spark_catalog", "default", "flint_ppl_test"))
val filterExpr = EqualTo(
UnresolvedAttribute("digest"),
UnresolvedFunction(
"md5",
seq(Literal("Spark")),
isDistinct = false))
val filterPlan = Filter(filterExpr, table)
val projectList = Seq(UnresolvedAttribute("digest"))
val expectedPlan = Project(projectList, filterPlan)
comparePlans(logicalPlan, expectedPlan, checkAnalysis = false)
}

test("test cryptographic hash functions - sha1") {
val frame = sql(s"""
| source = $testTable digest=sha1('Spark') | fields digest
| """.stripMargin)

val results: Array[Row] = frame.collect()
val expectedResults: Array[Row] = Array(Row("85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c"))
assert(results.sameElements(expectedResults))

val logicalPlan: LogicalPlan = frame.queryExecution.logical
val table = UnresolvedRelation(Seq("spark_catalog", "default", "flint_ppl_test"))
val filterExpr = EqualTo(
UnresolvedAttribute("digest"),
UnresolvedFunction(
"sha1",
seq(Literal("Spark")),
isDistinct = false))
val filterPlan = Filter(filterExpr, table)
val projectList = Seq(UnresolvedAttribute("digest"))
val expectedPlan = Project(projectList, filterPlan)
comparePlans(logicalPlan, expectedPlan, checkAnalysis = false)
}

test("test cryptographic hash functions - sha2") {
val frame = sql(s"""
| source = $testTable digest=sha2('Spark',256) | fields digest
| """.stripMargin)

val results: Array[Row] = frame.collect()
val expectedResults: Array[Row] = Array(Row("529bc3b07127ecb7e53a4dcf1991d9152c24537d919178022b2c42657f79a26b"))
assert(results.sameElements(expectedResults))

val logicalPlan: LogicalPlan = frame.queryExecution.logical
val table = UnresolvedRelation(Seq("spark_catalog", "default", "flint_ppl_test"))
val filterExpr = EqualTo(
UnresolvedAttribute("digest"),
UnresolvedFunction(
"sha2",
seq(Literal("Spark"), Literal(256)),
isDistinct = false))
val filterPlan = Filter(filterExpr, table)
val projectList = Seq(UnresolvedAttribute("digest"))
val expectedPlan = Project(projectList, filterPlan)
comparePlans(logicalPlan, expectedPlan, checkAnalysis = false)
}

// Todo
// +---------------------------------------+
// | Below tests are not supported (cast) |
Expand Down
5 changes: 5 additions & 0 deletions ppl-spark-integration/src/main/antlr4/OpenSearchPPLLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,11 @@ RADIANS: 'RADIANS';
SIN: 'SIN';
TAN: 'TAN';

// CRYPTOGRAPHIC FUNCTIONS
MD5: 'MD5';
SHA1: 'SHA1';
SHA2: 'SHA2';

// DATE AND TIME FUNCTIONS
ADDDATE: 'ADDDATE';
ADDTIME: 'ADDTIME';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -508,6 +508,7 @@ evalFunctionName
| systemFunctionName
| positionFunctionName
| coalesceFunctionName
| cryptographicFunctionName
;

functionArgs
Expand Down Expand Up @@ -623,6 +624,12 @@ trigonometricFunctionName
| TAN
;

cryptographicFunctionName
: MD5
| SHA1
| SHA2
;

dateTimeFunctionName
: ADDDATE
| ADDTIME
Expand Down Expand Up @@ -954,6 +961,7 @@ keywordsCanBeId
| textFunctionName
| mathematicalFunctionName
| positionFunctionName
| cryptographicFunctionName
// commands
| SEARCH
| DESCRIBE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,11 @@ public enum BuiltinFunctionName {
SIN(FunctionName.of("sin")),
TAN(FunctionName.of("tan")),

/** Cryptographic Functions. */
MD5(FunctionName.of("md5")),
SHA1(FunctionName.of("sha1")),
SHA2(FunctionName.of("sha2")),

/** Date and Time Functions. */
ADDDATE(FunctionName.of("adddate")),
// ADDTIME(FunctionName.of("addtime")),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,30 +13,7 @@
import java.util.List;
import java.util.Map;

import static org.opensearch.sql.expression.function.BuiltinFunctionName.ADD;
Gokul-Radhakrishnan marked this conversation as resolved.
Show resolved Hide resolved
import static org.opensearch.sql.expression.function.BuiltinFunctionName.ADDDATE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DATEDIFF;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DAY_OF_MONTH;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.COALESCE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.SUBTRACT;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.MULTIPLY;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DIVIDE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.MODULUS;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DAY_OF_WEEK;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.DAY_OF_YEAR;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.HOUR_OF_DAY;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.IS_NOT_NULL;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.IS_NULL;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.LENGTH;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.LOCALTIME;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.MINUTE_OF_HOUR;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.MONTH_OF_YEAR;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.SECOND_OF_MINUTE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.SUBDATE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.SYSDATE;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.TRIM;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.WEEK;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.WEEK_OF_YEAR;
import static org.opensearch.sql.expression.function.BuiltinFunctionName.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I'd like to use specific static import instead of .*.
You can set Class count to use import with '*' to 99 and Name count to use static import with '*' to 99 in IDEA Settings->Editor->Code Stype->Java to prevent the auto merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

import static org.opensearch.sql.ppl.utils.DataTypeTransformer.seq;
import static scala.Option.empty;

Expand Down Expand Up @@ -68,6 +45,10 @@ public interface BuiltinFunctionTranslator {
.put(DATEDIFF, "datediff")
.put(LOCALTIME, "localtimestamp")
.put(SYSDATE, "now")
// Cryptographic functions
.put(MD5, "md5")
.put(SHA1, "sha1")
.put(SHA2, "sha2")
// condition functions
.put(IS_NULL, "isnull")
.put(IS_NOT_NULL, "isnotnull")
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.flint.spark.ppl

import org.opensearch.flint.spark.ppl.PlaneUtils.plan
import org.opensearch.sql.ppl.{CatalystPlanContext, CatalystQueryPlanVisitor}
import org.opensearch.sql.ppl.utils.DataTypeTransformer.seq
import org.scalatest.matchers.should.Matchers

import org.apache.spark.SparkFunSuite
import org.apache.spark.sql.catalyst.analysis.{UnresolvedAttribute, UnresolvedFunction, UnresolvedRelation, UnresolvedStar}
import org.apache.spark.sql.catalyst.expressions.{Alias, EqualTo, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual, Literal, Not}
import org.apache.spark.sql.catalyst.plans.PlanTest
import org.apache.spark.sql.catalyst.plans.logical.{Filter, Project}

class PPLLogicalPlanCryptographicFunctionsTranslatorTestSuite
extends SparkFunSuite
with PlanTest
with LogicalPlanTestUtils
with Matchers {

private val planTransformer = new CatalystQueryPlanVisitor()
private val pplParser = new PPLSyntaxParser()

test("test md5") {
val context = new CatalystPlanContext
val logPlan = planTransformer.visit(plan(pplParser, "source=t a = md5(b)"), context)

val table = UnresolvedRelation(Seq("t"))
val filterExpr = EqualTo(
UnresolvedAttribute("a"),
UnresolvedFunction("md5", seq(UnresolvedAttribute("b")), isDistinct = false))
val filterPlan = Filter(filterExpr, table)
val projectList = Seq(UnresolvedStar(None))
val expectedPlan = Project(projectList, filterPlan)
comparePlans(expectedPlan, logPlan, false)
}

test("test sha1") {
val context = new CatalystPlanContext
val logPlan = planTransformer.visit(plan(pplParser, "source=t a = sha1(b)"), context)

val table = UnresolvedRelation(Seq("t"))
val filterExpr = EqualTo(
UnresolvedAttribute("a"),
UnresolvedFunction("sha1", seq(UnresolvedAttribute("b")), isDistinct = false))
val filterPlan = Filter(filterExpr, table)
val projectList = Seq(UnresolvedStar(None))
val expectedPlan = Project(projectList, filterPlan)
comparePlans(expectedPlan, logPlan, false)
}

test("test sha2") {
val context = new CatalystPlanContext
val logPlan = planTransformer.visit(plan(pplParser, "source=t a = sha2(b,256)"), context)

val table = UnresolvedRelation(Seq("t"))
val filterExpr = EqualTo(
UnresolvedAttribute("a"),
UnresolvedFunction("sha2", seq(UnresolvedAttribute("b"), Literal(256)), isDistinct = false))
val filterPlan = Filter(filterExpr, table)
val projectList = Seq(UnresolvedStar(None))
val expectedPlan = Project(projectList, filterPlan)
comparePlans(expectedPlan, logPlan, false)
}
}
Loading