Because SQL is declarative, the query only tells the DBMS what to compute, but not how to compute it. Thus, the DBMS needs to translate a SQL statement into an executable query plan. But there are different ways to execute each operator in a query plan (e.g., join algorithms) and there will be differences in performance among these plans. The job of the DBMS’s optimizer is to pick an optimal plan for any given query.
There are two high-level strategies for query optimization.
==Heuristics / Rules==
-
Rewrite the query to remove stupid / inefficient things.
-
These techniques may need to examine catalog, but they do not need to examine data.
==Cost-based Search==
- Use a model to estimate the cost of executing a plan.
- Evaluate multiple equivalent plans for a query and pick the one with the lowest cost.
The optimizer generates a mapping of a logical algebra expression to the optimal equivalent physical algebra expression. The logical plan is roughly equivalent to the relational algebra expressions in the query.
Physical operators define a specific execution strategy using an access path for the different operators in the query plan. Physical plans may depend on the physical format of the data that is processed (i.e. sorting, compression).
There does not always exist a one-to-one mapping from logical to physical plans.
Much of query optimization relies on the underlying concept that the high level properties of relational algebra are preserved across equivalent expressions. ==Two relational algebra expressions are equivalent if they generate the same set of tuples==.
This technique of transforming the underlying relational algebra representation of a logical plan is known as ==query rewriting==.
One example of relational algebra equivalence is predicate pushdown, in which a predicate is applied in a different position of the sequence to avoid unnecessary work. Figure 2 shows an example of predicate pushdown.
Transform a logical plan into an equivalent logical plan using pattern matching rules.
==The goal is to increase the likelihood of enumerating the optimal plan in the search==.
Cannot compare plans because there is no cost model but can "direct" a transformation to a preferred side.
Some selection optimizations include:
- ==Perform filters as early as possible (predicate pushdown)==.
- ==Reorder predicates so that the DBMS applies the most selective one first==.
- ==Breakup a complex predicate and pushing it down (split conjunctive predicates)==.
An example of predicate pushdown is shown in Figure 2.
Some projection optimizations include:
- Perform projections as early as possible to create smaller tuples and reduce intermediate results (==projection pushdown==).
- Project out all attributes except the ones requested or requires.
An example of projection pushdown in shown in Figure 3.
Another optimization that a DBMS can use is to remove impossible or unnecessary predicates. In this optimization, the DBMS elides evaluation of predicates whose result does not change per tuple in a table. Bypassing these predicates reduces computation cost. Figure 4 shows two examples of unnecessary predicates.
The ordering of JOIN operations is a key determinant of query performance. Exhaustive enumeration of all possible join orders is inefficient, so join-ordering optimization requires a cost model. However, we can still eliminate unnecessary joins with a heuristic approach to optimization. An example of join elimination is shown in Figure 6.
The DBMS can also optimize nested sub-queries without referencing a cost model. There are two different approaches to this type of optimization:
Re-write the query by ==de-correlating== and / or ==flattening== it. An example of this is shown in Figure 7.
A example with ==Split Conjunctive Predicates==, ==Predicate Pushdown==, ==Replace Cartesian Products with Joins== and ==Projection Pushdown==.