Skip to content

Commit

Permalink
Parameterized Queries
Browse files Browse the repository at this point in the history
  • Loading branch information
jaceklaskowski committed Mar 16, 2024
1 parent cde31de commit 1d73104
Show file tree
Hide file tree
Showing 5 changed files with 128 additions and 16 deletions.
24 changes: 22 additions & 2 deletions docs/SparkSession.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,11 +176,14 @@ Internally, `createDataset` first looks up the implicit [ExpressionEncoder](Expr

The expression encoder is then used to map elements (of the input `Seq[T]`) into a collection of [InternalRow](InternalRow.md)s. With the references and rows, `createDataset` returns a Dataset.md[Dataset] with a LocalRelation.md[`LocalRelation` logical query plan].

## <span id="sql"> Executing SQL Queries (SQL Mode)
## Executing SQL Queries (SQL Mode) { #sql }

```scala
sql(
sqlText: String): DataFrame
sql(
sqlText: String,
args: Map[String, Any]): DataFrame
```

`sql` creates a [QueryPlanningTracker](QueryPlanningTracker.md) to [measure](QueryPlanningTracker.md#measurePhase) executing the following in [parsing](QueryPlanningTracker.md#PARSING) phase:
Expand All @@ -193,7 +196,24 @@ In the end, `sql` [creates a DataFrame](Dataset.md#ofRows) with the following:
* The `LogicalPlan`
* The `QueryPlanningTracker`

## <span id="udf"> Accessing UDFRegistration
### sql Private Helper

```scala
sql(
sqlText: String,
args: Map[String, Any],
tracker: QueryPlanningTracker): DataFrame
```

`sql` requests the given [QueryPlanningTracker](QueryPlanningTracker.md) to [measure parsing phase](QueryPlanningTracker.md#measurePhase).

While being measured, `sql` requests the [SessionState](#sessionState) for the [sqlParser](SessionState.md#sqlParser) to [parse](sql/ParserInterface.md#parsePlan) the given `sqlText`.

With non-empty `args`, `sql` creates a [NameParameterizedQuery](logical-operators/NameParameterizedQuery.md) with the parsed logical plan and the `args`. `sql` converts the values to literals.

In the end, `sql` creates a [DataFrame](Dataset.md#ofRows) for the plan produced (and the `QueryPlanningTracker`).

## Accessing UDFRegistration { #udf }

```scala
udf: UDFRegistration
Expand Down
20 changes: 15 additions & 5 deletions docs/catalyst/TreePattern.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

`TreePattern`s are part of [TreeNode](TreeNode.md#node-patterns)s.

## <span id="CTE"> CTE
## CTE { #CTE }

Used as a [node pattern](TreeNode.md#nodePatterns):

* [CTERelationDef](../logical-operators/CTERelationDef.md)
* [CTERelationRef](../logical-operators/CTERelationRef.md)
* [WithCTE](../logical-operators/WithCTE.md)

## <span id="DYNAMIC_PRUNING_EXPRESSION"> DYNAMIC_PRUNING_EXPRESSION
## DYNAMIC_PRUNING_EXPRESSION { #DYNAMIC_PRUNING_EXPRESSION }

Used as a [node pattern](TreeNode.md#nodePatterns):

Expand All @@ -21,7 +21,7 @@ Used to transform query plans in the following rules:
* [PlanAdaptiveDynamicPruningFilters](../physical-optimizations/PlanAdaptiveDynamicPruningFilters.md)
* [CleanupDynamicPruningFilters](../logical-optimizations/CleanupDynamicPruningFilters.md)

## <span id="EXCHANGE"> EXCHANGE
## EXCHANGE { #EXCHANGE }

Used as a [node pattern](TreeNode.md#nodePatterns):

Expand All @@ -31,13 +31,23 @@ Used to transform query plans in the following rules:

* [ReuseExchangeAndSubquery](../physical-optimizations/ReuseExchangeAndSubquery.md)

## <span id="PLAN_EXPRESSION"> PLAN_EXPRESSION
## PARAMETERIZED_QUERY { #PARAMETERIZED_QUERY }

Used as a [node pattern](TreeNode.md#nodePatterns):

* [ParameterizedQuery](../logical-operators/ParameterizedQuery.md)

Used in the following rules:

* [BindParameters](../logical-analysis-rules/BindParameters.md)

## PLAN_EXPRESSION { #PLAN_EXPRESSION }

Used as a [node pattern](TreeNode.md#nodePatterns):

* [PlanExpression](../expressions/PlanExpression.md)

## <span id="UNRESOLVED_HINT"> UNRESOLVED_HINT
## UNRESOLVED_HINT { #UNRESOLVED_HINT }

Used as a [node pattern](TreeNode.md#nodePatterns):

Expand Down
20 changes: 18 additions & 2 deletions docs/logical-operators/NameParameterizedQuery.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
# NameParameterizedQuery
---
title: NameParameterizedQuery
---

`NameParameterizedQuery` is...FIXME
# NameParameterizedQuery Unary Logical Operator

`NameParameterizedQuery` is a [ParameterizedQuery](ParameterizedQuery.md) logical operator that represents a parameterized query with named parameters.

## Creating Instance

`NameParameterizedQuery` takes the following to be created:

* <span id="child"> Child [LogicalPlan](LogicalPlan.md)
* <span id="args"> Arguments (`Map[String, Expression]`)

`NameParameterizedQuery` is created when:

* `SparkConnectPlanner` is requested to [transformSql](../connect/SparkConnectPlanner.md#transformSql)
* `SparkSession` is requested to [sql](../SparkSession.md#sql)
34 changes: 32 additions & 2 deletions docs/logical-operators/ParameterizedQuery.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,33 @@
# ParameterizedQuery
---
title: ParameterizedQuery
---

`ParameterizedQuery` is...FIXME
# ParameterizedQuery Unary Logical Operators

`ParameterizedQuery` is a marker extension of the `UnresolvedUnaryNode` abstraction for [unary logical operators](#implementations) with the [PARAMETERIZED_QUERY](#nodePatterns) tree node pattern.

## Implementations

* [NameParameterizedQuery](NameParameterizedQuery.md)
* `PosParameterizedQuery`

## Creating Instance

`ParameterizedQuery` takes the following to be created:

* <span id="child"> Child [LogicalPlan](LogicalPlan.md) (_unused_)

!!! note "Abstract Class"
`ParameterizedQuery` is an abstract class and cannot be created directly. It is created indirectly for the [concrete ParameterizedQueries](#implementations).

## Node Patterns { #nodePatterns }

??? note "TreeNode"

```scala
nodePatterns: Seq[TreePattern]
```

`nodePatterns` is part of the [TreeNode](../catalyst/TreeNode.md#nodePatterns) abstraction.

`nodePatterns` is just a single [PARAMETERIZED_QUERY](../catalyst/TreePattern.md#PARAMETERIZED_QUERY).
46 changes: 41 additions & 5 deletions docs/parameterized-queries/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,43 @@
---
status: new
---
# Parameterized Queries

# Parameterized Queries :material-new-box:{ title="New in 3.5.0" }
**Parameterized Queries** (_Parameterized SQL_) allows Spark SQL developers to write SQL statements with parameter markers to be bound at execution time with parameters (literals) by name or position.

[BindParameters Logical Analysis Rule](../logical-analysis-rules/BindParameters.md)
Parameterized Queries are supposed to improve security and reusability, and help preventing SQL injection attacks for applications that generate SQL at runtime (e.g., based on a user's selections, which is often done via a user interface).

Parameterized Queries supports named and positional parameters. [SQL parser](../SessionState.md#sqlParser) can recognize them using the following:

* `:` (colon) followed by name for named parameters
* `?` (question mark) for positional parameters

=== "Named Parameters"

```sql
WITH a AS (SELECT 1 c)
SELECT *
FROM a
LIMIT :limitA
```

=== "Positional Parameters"

```sql
WITH a AS (SELECT 1 c)
SELECT *
FROM a
LIMIT ?
```

Parameterized Queries are executed using [SparkSession.sql](../SparkSession.md#sql) operator (marked as experimental).

```scala
sql(
sqlText: String,
args: Map[String, Any]): DataFrame
```

Parameterized Queries feature was introduced in [\[SPARK-41271\] Parameterized SQL]({{ spark.jira }}/SPARK-41271).

## Internals

* [BindParameters Logical Analysis Rule](../logical-analysis-rules/BindParameters.md)
* [ParameterizedQuery](../logical-operators/ParameterizedQuery.md) logical unary nodes

0 comments on commit 1d73104

Please sign in to comment.