Skip to content

Commit

Permalink
Merge branch 'main' into write-metadata-cache
Browse files Browse the repository at this point in the history
Signed-off-by: Sean Kao <[email protected]>
  • Loading branch information
seankao-az committed Oct 25, 2024
2 parents 2f58f56 + 7bc0927 commit 7a8e1f3
Show file tree
Hide file tree
Showing 45 changed files with 4,670 additions and 83 deletions.
1 change: 1 addition & 0 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ lazy val flintCore = (project in file("flint-core"))
"com.amazonaws" % "aws-java-sdk-cloudwatch" % "1.12.593"
exclude("com.fasterxml.jackson.core", "jackson-databind"),
"software.amazon.awssdk" % "auth-crt" % "2.28.10",
"org.projectlombok" % "lombok" % "1.18.30" % "provided",
"org.scalactic" %% "scalactic" % "3.2.15" % "test",
"org.scalatest" %% "scalatest" % "3.2.15" % "test",
"org.scalatest" %% "scalatest-flatspec" % "3.2.15" % "test",
Expand Down
34 changes: 34 additions & 0 deletions docs/ppl-lang/PPL-Example-Commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,12 @@ _- **Limitation: new field added by eval command with a function cannot be dropp
- `source = table | eval b1 = b + 1 | fields - b1,c` (Field `b1` cannot be dropped caused by SPARK-49782)
- `source = table | eval b1 = lower(b) | fields - b1,c` (Field `b1` cannot be dropped caused by SPARK-49782)

**Field-Summary**
[See additional command details](ppl-fieldsummary-command.md)
- `source = t | fieldsummary includefields=status_code nulls=false`
- `source = t | fieldsummary includefields= id, status_code, request_path nulls=true`
- `source = t | where status_code != 200 | fieldsummary includefields= status_code nulls=true`

**Nested-Fields**
- `source = catalog.schema.table1, catalog.schema.table2 | fields A.nested1, B.nested1`
- `source = catalog.table | where struct_col2.field1.subfield > 'valueA' | sort int_col | fields int_col, struct_col.field1.subfield, struct_col2.field1.subfield`
Expand Down Expand Up @@ -167,6 +173,34 @@ source = table | where ispresent(a) |
- `source = table | stats avg(age) as avg_state_age by country, state | stats avg(avg_state_age) as avg_country_age by country`
- `source = table | stats avg(age) as avg_city_age by country, state, city | eval new_avg_city_age = avg_city_age - 1 | stats avg(new_avg_city_age) as avg_state_age by country, state | where avg_state_age > 18 | stats avg(avg_state_age) as avg_adult_country_age by country`

#### **Event Aggregations**
[See additional command details](ppl-eventstats-command.md)

- `source = table | eventstats avg(a) `
- `source = table | where a < 50 | eventstats avg(c) `
- `source = table | eventstats max(c) by b`
- `source = table | eventstats count(c) by b | head 5`
- `source = table | eventstats stddev_samp(c)`
- `source = table | eventstats stddev_pop(c)`
- `source = table | eventstats percentile(c, 90)`
- `source = table | eventstats percentile_approx(c, 99)`

**Limitation: distinct aggregation could not used in `eventstats`:**_
- `source = table | eventstats distinct_count(c)` (throw exception)

**Aggregations With Span**
- `source = table | eventstats count(a) by span(a, 10) as a_span`
- `source = table | eventstats sum(age) by span(age, 5) as age_span | head 2`
- `source = table | eventstats avg(age) by span(age, 20) as age_span, country | sort - age_span | head 2`

**Aggregations With TimeWindow Span (tumble windowing function)**
- `source = table | eventstats sum(productsAmount) by span(transactionDate, 1d) as age_date | sort age_date`
- `source = table | eventstats sum(productsAmount) by span(transactionDate, 1w) as age_date, productId`

**Aggregations Group by Multiple Times**
- `source = table | eventstats avg(age) as avg_state_age by country, state | eventstats avg(avg_state_age) as avg_country_age by country`
- `source = table | eventstats avg(age) as avg_city_age by country, state, city | eval new_avg_city_age = avg_city_age - 1 | eventstats avg(new_avg_city_age) as avg_state_age by country, state | where avg_state_age > 18 | eventstats avg(avg_state_age) as avg_adult_country_age by country`

#### **Dedup**

[See additional command details](ppl-dedup-command.md)
Expand Down
4 changes: 4 additions & 0 deletions docs/ppl-lang/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ For additional examples see the next [documentation](PPL-Example-Commands.md).

- [`stats command`](ppl-stats-command.md)

- [`eventstats command`](ppl-eventstats-command.md)

- [`where command`](ppl-where-command.md)

- [`head command`](ppl-head-command.md)
Expand Down Expand Up @@ -77,6 +79,8 @@ For additional examples see the next [documentation](PPL-Example-Commands.md).

- [`String Functions`](functions/ppl-string.md)

- [`JSON Functions`](functions/ppl-json.md)

- [`Condition Functions`](functions/ppl-condition.md)

- [`Type Conversion Functions`](functions/ppl-conversion.md)
Expand Down
237 changes: 237 additions & 0 deletions docs/ppl-lang/functions/ppl-json.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
## PPL JSON Functions

### `JSON`

**Description**

`json(value)` Evaluates whether a value can be parsed as JSON. Returns the json string if valid, null otherwise.

**Argument type:** STRING/JSON_ARRAY/JSON_OBJECT

**Return type:** STRING

A STRING expression of a valid JSON object format.

Example:

os> source=people | eval `valid_json()` = json('[1,2,3,{"f1":1,"f2":[5,6]},4]') | fields valid_json
fetched rows / total rows = 1/1
+---------------------------------+
| valid_json |
+---------------------------------+
| [1,2,3,{"f1":1,"f2":[5,6]},4] |
+---------------------------------+

os> source=people | eval `invalid_json()` = json('{"invalid": "json"') | fields invalid_json
fetched rows / total rows = 1/1
+----------------+
| invalid_json |
+----------------+
| null |
+----------------+


### `JSON_OBJECT`

**Description**

`json_object(<key>, <value>[, <key>, <value>]...)` returns a JSON object from members of key-value pairs.

**Argument type:**
- A \<key\> must be STRING.
- A \<value\> can be any data types.

**Return type:** JSON_OBJECT (Spark StructType)

A StructType expression of a valid JSON object.

Example:

os> source=people | eval result = json(json_object('key', 123.45)) | fields result
fetched rows / total rows = 1/1
+------------------+
| result |
+------------------+
| {"key":123.45} |
+------------------+

os> source=people | eval result = json(json_object('outer', json_object('inner', 123.45))) | fields result
fetched rows / total rows = 1/1
+------------------------------+
| result |
+------------------------------+
| {"outer":{"inner":123.45}} |
+------------------------------+


### `JSON_ARRAY`

**Description**

`json_array(<value>...)` Creates a JSON ARRAY using a list of values.

**Argument type:**
- A \<value\> can be any kind of value such as string, number, or boolean.

**Return type:** ARRAY (Spark ArrayType)

An array of any supported data type for a valid JSON array.

Example:

os> source=people | eval `json_array` = json_array(1, 2, 0, -1, 1.1, -0.11)
fetched rows / total rows = 1/1
+----------------------------+
| json_array |
+----------------------------+
| 1.0,2.0,0.0,-1.0,1.1,-0.11 |
+----------------------------+

os> source=people | eval `json_array_object` = json(json_object("array", json_array(1, 2, 0, -1, 1.1, -0.11)))
fetched rows / total rows = 1/1
+----------------------------------------+
| json_array_object |
+----------------------------------------+
| {"array":[1.0,2.0,0.0,-1.0,1.1,-0.11]} |
+----------------------------------------+

### `JSON_ARRAY_LENGTH`

**Description**

`json_array_length(jsonArray)` Returns the number of elements in the outermost JSON array.

**Argument type:** STRING/JSON_ARRAY

A STRING expression of a valid JSON array format, or JSON_ARRAY object.

**Return type:** INTEGER

`NULL` is returned in case of any other valid JSON string, `NULL` or an invalid JSON.

Example:

os> source=people | eval `lenght1` = json_array_length('[1,2,3,4]'), `lenght2` = json_array_length('[1,2,3,{"f1":1,"f2":[5,6]},4]'), `not_array` = json_array_length('{"key": 1}')
fetched rows / total rows = 1/1
+-----------+-----------+-------------+
| lenght1 | lenght2 | not_array |
+-----------+-----------+-------------+
| 4 | 5 | null |
+-----------+-----------+-------------+

os> source=people | eval `json_array` = json_array_length(json_array(1,2,3,4)), `empty_array` = json_array_length(json_array())
fetched rows / total rows = 1/1
+--------------+---------------+
| json_array | empty_array |
+--------------+---------------+
| 4 | 0 |
+--------------+---------------+

### `JSON_EXTRACT`

**Description**

`json_extract(jsonStr, path)` Extracts json object from a json string based on json path specified. Return null if the input json string is invalid.

**Argument type:** STRING, STRING

**Return type:** STRING

A STRING expression of a valid JSON object format.

`NULL` is returned in case of an invalid JSON.

Example:

os> source=people | eval `json_extract('{"a":"b"}', '$.a')` = json_extract('{"a":"b"}', '$a')
fetched rows / total rows = 1/1
+----------------------------------+
| json_extract('{"a":"b"}', 'a') |
+----------------------------------+
| b |
+----------------------------------+

os> source=people | eval `json_extract('{"a":[{"b":1},{"b":2}]}', '$.a[1].b')` = json_extract('{"a":[{"b":1},{"b":2}]}', '$.a[1].b')
fetched rows / total rows = 1/1
+-----------------------------------------------------------+
| json_extract('{"a":[{"b":1.0},{"b":2.0}]}', '$.a[1].b') |
+-----------------------------------------------------------+
| 2.0 |
+-----------------------------------------------------------+

os> source=people | eval `json_extract('{"a":[{"b":1},{"b":2}]}', '$.a[*].b')` = json_extract('{"a":[{"b":1},{"b":2}]}', '$.a[*].b')
fetched rows / total rows = 1/1
+-----------------------------------------------------------+
| json_extract('{"a":[{"b":1.0},{"b":2.0}]}', '$.a[*].b') |
+-----------------------------------------------------------+
| [1.0,2.0] |
+-----------------------------------------------------------+

os> source=people | eval `invalid_json` = json_extract('{"invalid": "json"')
fetched rows / total rows = 1/1
+----------------+
| invalid_json |
+----------------+
| null |
+----------------+


### `JSON_KEYS`

**Description**

`json_keys(jsonStr)` Returns all the keys of the outermost JSON object as an array.

**Argument type:** STRING

A STRING expression of a valid JSON object format.

**Return type:** ARRAY[STRING]

`NULL` is returned in case of any other valid JSON string, or an empty string, or an invalid JSON.

Example:

os> source=people | eval `keys` = json_keys('{"f1":"abc","f2":{"f3":"a","f4":"b"}}')
fetched rows / total rows = 1/1
+------------+
| keus |
+------------+
| [f1, f2] |
+------------+

os> source=people | eval `keys` = json_keys('[1,2,3,{"f1":1,"f2":[5,6]},4]')
fetched rows / total rows = 1/1
+--------+
| keys |
+--------+
| null |
+--------+

### `JSON_VALID`

**Description**

`json_valid(jsonStr)` Evaluates whether a JSON string uses valid JSON syntax and returns TRUE or FALSE.

**Argument type:** STRING

**Return type:** BOOLEAN

Example:

os> source=people | eval `valid_json` = json_valid('[1,2,3,4]'), `invalid_json` = json_valid('{"invalid": "json"') | feilds `valid_json`, `invalid_json`
fetched rows / total rows = 1/1
+--------------+----------------+
| valid_json | invalid_json |
+--------------+----------------+
| True | False |
+--------------+----------------+

os> source=accounts | where json_valid('[1,2,3,4]') and isnull(email) | fields account_number, email
fetched rows / total rows = 1/1
+------------------+---------+
| account_number | email |
|------------------+---------|
| 13 | null |
+------------------+---------+
Loading

0 comments on commit 7a8e1f3

Please sign in to comment.