update documentation with specifications markdown pages including ppl…

… expressions Signed-off-by: YANGDB <[email protected]>
opensearch-project · Oct 4, 2024 · d7ee664 · d7ee664
1 parent b1791ff
commit d7ee664
Show file tree

Hide file tree

Showing 4 changed files with 41 additions and 3 deletions.
diff --git a/docs/ppl-lang/ppl-dedup-command.md b/docs/ppl-lang/ppl-dedup-command.md
@@ -124,3 +124,34 @@ PPL query:
 - `source = table | dedup 2 a,b keepempty=true | fields a,b,c`
 - `source = table | dedup 1 a consecutive=true| fields a,b,c` (Consecutive deduplication is unsupported)
 
+### Limitation:
+
+**Spark Support** (3.4)
+
+To translate `dedup` command with `allowedDuplication > 1`, such as `| dedup 2 a,b` to Spark plan, the solution is translating to a plan with Window function (e.g row_number) and a new column `row_number_col` as Filter.
+
+- For `| dedup 2 a, b keepempty=false`
+
+```
+DataFrameDropColumns('_row_number_)
++- Filter ('_row_number_ <= 2) // allowed duplication = 2
+   +- Window [row_number() windowspecdefinition('a, 'b, 'a ASC NULLS FIRST, 'b ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _row_number_], ['a, 'b], ['a ASC NULLS FIRST, 'b ASC NULLS FIRST]
+       +- Filter (isnotnull('a) AND isnotnull('b)) // keepempty=false
+          +- Project
+             +- UnresolvedRelation
+```
+- For `| dedup 2 a, b keepempty=true`
+```
+Union
+:- DataFrameDropColumns('_row_number_)
+:  +- Filter ('_row_number_ <= 2)
+:     +- Window [row_number() windowspecdefinition('a, 'b, 'a ASC NULLS FIRST, 'b ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS _row_number_], ['a, 'b], ['a ASC NULLS FIRST, 'b ASC NULLS FIRST]
+:        +- Filter (isnotnull('a) AND isnotnull('b))
+:           +- Project
+:              +- UnresolvedRelation
++- Filter (isnull('a) OR isnull('b))
+   +- Project
+      +- UnresolvedRelation
+```
+
+ - this `dedup` command with `allowedDuplication > 1` feature needs spark version >= 3.4 
diff --git a/docs/ppl-lang/ppl-eval-command.md b/docs/ppl-lang/ppl-eval-command.md
@@ -105,7 +105,9 @@ eval status_category =
 ```
 
 ### Limitation:
-Overriding existing field is unsupported, following queries throw exceptions with "Reference 'a' is ambiguous"
+ - `eval` with comma separated expression needs spark version >= 3.4
+
+ - Overriding existing field is unsupported, following queries throw exceptions with "Reference 'a' is ambiguous"
 
 ```sql
 - `source = table | eval a = 10 | fields a,b,c`

diff --git a/docs/ppl-lang/ppl-fields-command.md b/docs/ppl-lang/ppl-fields-command.md
@@ -56,13 +56,16 @@ PPL query:
 - `source = table | eval b1 = b | fields - b1,c`
 
 ### Limitation: 
-new field added by eval command with a function cannot be dropped in current version:**_
+ - `fields - list` shows incorrect results for spark version 3.3 - see [issue](https://github.com/opensearch-project/opensearch-spark/pull/732)
+ - new field added by eval command with a function cannot be dropped in current version:**_
+
 ```sql
  `source = table | eval b1 = b + 1 | fields - b1,c` (Field `b1` cannot be dropped caused by SPARK-49782)
  `source = table | eval b1 = lower(b) | fields - b1,c` (Field `b1` cannot be dropped caused by SPARK-49782)
 ```
 
 **Nested-Fields**
+ - nested field shows incorrect results for spark version 3.3 - see [issue](https://github.com/opensearch-project/opensearch-spark/issues/739) 
 ```sql
 `source = catalog.schema.table1, catalog.schema.table2 | fields A.nested1, B.nested1`
 `source = catalog.table | where struct_col2.field1.subfield > 'valueA' | sort int_col | fields  int_col, struct_col.field1.subfield, struct_col2.field1.subfield`

diff --git a/docs/ppl-lang/ppl-rename-command.md b/docs/ppl-lang/ppl-rename-command.md
@@ -47,6 +47,8 @@ PPL query:
     +------+---------+
 
 ### Limitation:
-Overriding existing field is unsupported:
+- `rename` command needs spark version >= 3.4
+
+- Overriding existing field is unsupported:
 
 `source=accounts | grok address '%{NUMBER} %{GREEDYDATA:address}' | fields address`