Parameterized Queries

japila-books · Mar 16, 2024 · 1d73104 · 1d73104
1 parent cde31de
commit 1d73104
Show file tree

Hide file tree

Showing 5 changed files with 128 additions and 16 deletions.
diff --git a/docs/SparkSession.md b/docs/SparkSession.md
@@ -176,11 +176,14 @@ Internally, `createDataset` first looks up the implicit [ExpressionEncoder](Expr
 
 The expression encoder is then used to map elements (of the input `Seq[T]`) into a collection of [InternalRow](InternalRow.md)s. With the references and rows, `createDataset` returns a Dataset.md[Dataset] with a LocalRelation.md[`LocalRelation` logical query plan].
 
-## <span id="sql"> Executing SQL Queries (SQL Mode)
+## Executing SQL Queries (SQL Mode) { #sql }
 
 ```scala
 sql(
   sqlText: String): DataFrame
+sql(
+  sqlText: String,
+  args: Map[String, Any]): DataFrame
 ```
 
 `sql` creates a [QueryPlanningTracker](QueryPlanningTracker.md) to [measure](QueryPlanningTracker.md#measurePhase) executing the following in [parsing](QueryPlanningTracker.md#PARSING) phase:
@@ -193,7 +196,24 @@ In the end, `sql` [creates a DataFrame](Dataset.md#ofRows) with the following:
 * The `LogicalPlan`
 * The `QueryPlanningTracker`
 
-## <span id="udf"> Accessing UDFRegistration
+### sql Private Helper
+
+```scala
+sql(
+  sqlText: String,
+  args: Map[String, Any],
+  tracker: QueryPlanningTracker): DataFrame
+```
+
+`sql` requests the given [QueryPlanningTracker](QueryPlanningTracker.md) to [measure parsing phase](QueryPlanningTracker.md#measurePhase).
+
+While being measured, `sql` requests the [SessionState](#sessionState) for the [sqlParser](SessionState.md#sqlParser) to [parse](sql/ParserInterface.md#parsePlan) the given `sqlText`.
+
+With non-empty `args`, `sql` creates a [NameParameterizedQuery](logical-operators/NameParameterizedQuery.md) with the parsed logical plan and the `args`. `sql` converts the values to literals.
+
+In the end, `sql` creates a [DataFrame](Dataset.md#ofRows) for the plan produced (and the `QueryPlanningTracker`).
+
+## Accessing UDFRegistration { #udf }
 
 ```scala
 udf: UDFRegistration

diff --git a/docs/catalyst/TreePattern.md b/docs/catalyst/TreePattern.md
@@ -2,15 +2,15 @@
 
 `TreePattern`s are part of [TreeNode](TreeNode.md#node-patterns)s.
 
-## <span id="CTE"> CTE
+## CTE { #CTE }
 
 Used as a [node pattern](TreeNode.md#nodePatterns):
 
 * [CTERelationDef](../logical-operators/CTERelationDef.md)
 * [CTERelationRef](../logical-operators/CTERelationRef.md)
 * [WithCTE](../logical-operators/WithCTE.md)
 
-## <span id="DYNAMIC_PRUNING_EXPRESSION"> DYNAMIC_PRUNING_EXPRESSION
+## DYNAMIC_PRUNING_EXPRESSION { #DYNAMIC_PRUNING_EXPRESSION }
 
 Used as a [node pattern](TreeNode.md#nodePatterns):
 
@@ -21,7 +21,7 @@ Used to transform query plans in the following rules:
 * [PlanAdaptiveDynamicPruningFilters](../physical-optimizations/PlanAdaptiveDynamicPruningFilters.md)
 * [CleanupDynamicPruningFilters](../logical-optimizations/CleanupDynamicPruningFilters.md)
 
-## <span id="EXCHANGE"> EXCHANGE
+## EXCHANGE { #EXCHANGE }
 
 Used as a [node pattern](TreeNode.md#nodePatterns):
 
@@ -31,13 +31,23 @@ Used to transform query plans in the following rules:
 
 * [ReuseExchangeAndSubquery](../physical-optimizations/ReuseExchangeAndSubquery.md)
 
-## <span id="PLAN_EXPRESSION"> PLAN_EXPRESSION
+## PARAMETERIZED_QUERY { #PARAMETERIZED_QUERY }
+
+Used as a [node pattern](TreeNode.md#nodePatterns):
+
+* [ParameterizedQuery](../logical-operators/ParameterizedQuery.md)
+
+Used in the following rules:
+
+* [BindParameters](../logical-analysis-rules/BindParameters.md)
+
+## PLAN_EXPRESSION { #PLAN_EXPRESSION }
 
 Used as a [node pattern](TreeNode.md#nodePatterns):
 
 * [PlanExpression](../expressions/PlanExpression.md)
 
-## <span id="UNRESOLVED_HINT"> UNRESOLVED_HINT
+## UNRESOLVED_HINT { #UNRESOLVED_HINT }
 
 Used as a [node pattern](TreeNode.md#nodePatterns):
 

diff --git a/docs/logical-operators/NameParameterizedQuery.md b/docs/logical-operators/NameParameterizedQuery.md
@@ -1,3 +1,19 @@
-# NameParameterizedQuery
+---
+title: NameParameterizedQuery
+---
 
-`NameParameterizedQuery` is...FIXME
+# NameParameterizedQuery Unary Logical Operator
+
+`NameParameterizedQuery` is a [ParameterizedQuery](ParameterizedQuery.md) logical operator that represents a parameterized query with named parameters.
+
+## Creating Instance
+
+`NameParameterizedQuery` takes the following to be created:
+
+* <span id="child"> Child [LogicalPlan](LogicalPlan.md)
+* <span id="args"> Arguments (`Map[String, Expression]`)
+
+`NameParameterizedQuery` is created when:
+
+* `SparkConnectPlanner` is requested to [transformSql](../connect/SparkConnectPlanner.md#transformSql)
+* `SparkSession` is requested to [sql](../SparkSession.md#sql)
diff --git a/docs/logical-operators/ParameterizedQuery.md b/docs/logical-operators/ParameterizedQuery.md
@@ -1,3 +1,33 @@
-# ParameterizedQuery
+---
+title: ParameterizedQuery
+---
 
-`ParameterizedQuery` is...FIXME
+# ParameterizedQuery Unary Logical Operators
+
+`ParameterizedQuery` is a marker extension of the `UnresolvedUnaryNode` abstraction for [unary logical operators](#implementations) with the [PARAMETERIZED_QUERY](#nodePatterns) tree node pattern.
+
+## Implementations
+
+* [NameParameterizedQuery](NameParameterizedQuery.md)
+* `PosParameterizedQuery`
+
+## Creating Instance
+
+`ParameterizedQuery` takes the following to be created:
+
+* <span id="child"> Child [LogicalPlan](LogicalPlan.md) (_unused_)
+
+!!! note "Abstract Class"
+    `ParameterizedQuery` is an abstract class and cannot be created directly. It is created indirectly for the [concrete ParameterizedQueries](#implementations).
+
+## Node Patterns { #nodePatterns }
+
+??? note "TreeNode"
+
+    ```scala
+    nodePatterns: Seq[TreePattern]
+    ```
+
+    `nodePatterns` is part of the [TreeNode](../catalyst/TreeNode.md#nodePatterns) abstraction.
+
+`nodePatterns` is just a single [PARAMETERIZED_QUERY](../catalyst/TreePattern.md#PARAMETERIZED_QUERY).
diff --git a/docs/parameterized-queries/index.md b/docs/parameterized-queries/index.md
@@ -1,7 +1,43 @@
----
-status: new
----
+# Parameterized Queries
 
-# Parameterized Queries :material-new-box:{ title="New in 3.5.0" }
+**Parameterized Queries** (_Parameterized SQL_) allows Spark SQL developers to write SQL statements with parameter markers to be bound at execution time with parameters (literals) by name or position.
 
-[BindParameters Logical Analysis Rule](../logical-analysis-rules/BindParameters.md)
+Parameterized Queries are supposed to improve security and reusability, and help preventing SQL injection attacks for applications that generate SQL at runtime (e.g., based on a user's selections, which is often done via a user interface).
+
+Parameterized Queries supports named and positional parameters. [SQL parser](../SessionState.md#sqlParser) can recognize them using the following:
+
+* `:` (colon) followed by name for named parameters
+* `?` (question mark) for positional parameters
+
+=== "Named Parameters"
+
+    ```sql
+    WITH a AS (SELECT 1 c)
+    SELECT *
+    FROM a
+    LIMIT :limitA
+    ```
+
+=== "Positional Parameters"
+
+    ```sql
+    WITH a AS (SELECT 1 c)
+    SELECT *
+    FROM a
+    LIMIT ?
+    ```
+
+Parameterized Queries are executed using [SparkSession.sql](../SparkSession.md#sql) operator (marked as experimental).
+
+```scala
+sql(
+  sqlText: String,
+  args: Map[String, Any]): DataFrame
+```
+
+Parameterized Queries feature was introduced in [\[SPARK-41271\] Parameterized SQL]({{ spark.jira }}/SPARK-41271).
+
+## Internals
+
+* [BindParameters Logical Analysis Rule](../logical-analysis-rules/BindParameters.md)
+* [ParameterizedQuery](../logical-operators/ParameterizedQuery.md) logical unary nodes