- Correctness
- Readability
- Optimization
- FROM (and JOIN): limit the the search space here
- WHERE : filter the data
- GROUP BY : Aggregates the data
- HAVING: Filters data after aggregation
- SELECT: grabs the columns and then deduplicates if DISTINCT is invoked
- UNION: merges elected data
- ORDER BY: sorts the results
- Get to know your Data
- Minimise the data as much as possible
- Use limit till you have finalised the query
- Use ON (explicit, takes advantage of db index) to join table instead of WHERE
- Use aliases to refer to tables to avoid ambiguity
- Avoid fns in WHERE clause (the fn in run for every row) it prevent db from using an Index.
- Avoid wildcards in the beginning of a string, it will lead to full table search.
Ex: Avoid
SELECT column FROM table WHERE col LIKE "%wizar%"
PreferSELECT column FROM table WHERE col LIKE "wizar%"
- Prefer EXISTS to IN. EXISTS returns as soon as a value is found IN scans the whole table
- SELECT columns, not stars
- Prefer UNION ALL to UNION the first will not remove duplicates and will be faster
- Avoid SORTING if possible as its expensive
If you are usually filtering by a column the it should probably be indexed (ex timestamp, main_event)
You can create index for a part of data using WHERE clause. Ex if you want to index only last weeks data.
For the columns which typically go together
Ex: CREATE INDEX full_name_index ON customers (last_name, first_name)
Some DBs support EXPLAIN ANALYZE which shows the execution of a query
Ex:
EXPLAIN ANALYZE SELECT title, release_year
FROM film
WHERE release_year > 2000;
Output:
Seq Scan on film (cost=0.00..66.50 rows=1000 width=19) (actual time=0.008..0.311 rows=1000 loops=1)
Filter: ((release_year)::integer > 2000)
Planning Time: 0.062 ms
Execution Time: 0.416 ms
Use this to encapsulate Logic, like creating a View
ITH product_orders AS (
SELECT o.created_at AS order_date,
p.title AS product_title,
(o.subtotal / o.quantity) AS revenue_per_unit
FROM orders AS o
LEFT JOIN products AS p ON o.product_id = p.id
WHERE o.quantity > 0
)
SELECT product_title AS product,
AVG(revenue_per_unit) AS avg_revenue_per_unit,
MAX(revenue_per_unit) AS max_revenue_per_unit,
MIN(revenue_per_unit) AS min_revenue_per_unit
FROM product_orders
WHERE order_date BETWEEN '2019-01-01' AND '2019-12-31'
GROUP BY product
ORDER BY avg_revenue_per_unit DESC