SQL (Structured Query Language) is a programming language used for managing relational databases. PostgreSQL is a powerful, open-source relational database management system that supports SQL queries and provides advanced features.
- No previous knowledge is required!
- Create a new database:
CREATE DATABASE db_name;
- Drop/remove an existing database:
DROP DATABASE db_name; -- Can throw an error if it does not exist.
DROP DATABASE IF EXISTS db_name; -- Only drops if it exists (safer version).
DROP DATABASE db_name WITH (FORCE); -- Forces the drop.
- Create a new table:
CREATE TABLE table_name ( column_1 datatype, column_2 datatype, column_3 datatype, .... );
- Drop/remove an existing table:
DROP TABLE table_name;
- Rename an existing table:
ALTER TABLE current_table_name RENAME TO new_table_name;
- Add, delete, or modify columns in an existing table:
- Add Column:
ALTER TABLE table_name ADD column_name datatype;
- Drop Column:
ALTER TABLE table_name DROP COLUMN column_name;
- Rename Column:
ALTER TABLE table_name RENAME COLUMN old_name TO new_name;
- Change Datatype:
ALTER TABLE table_name ALTER COLUMN column_name TYPE newDatatype;
- Add Column:
Official Docs: https://www.postgresql.org/docs/current/datatype.html
- No Decimal:
Numbers without fractional components.
Type Size Range smallint
2 bytes -32768 to +32767 integer
4 bytes -2147483648 to +2147483647 bigint
8 bytes -9223372036854775808 to +9223372036854775807 - Decimal:
Numbers with fractional components.
Type Size (max) Range decimal()
variable (131072 bytes) Up to 131072 digits before .
and 16383 digits afternumeric()
variable (131072 bytes) Up to 131072 digits before .
and 16383 digits afterThey are the exact same; there's no difference between these two.
- Useful:
Where precision is critical, such as financial calculations (e.g., currency amounts, interest rates), scientific calculations requiring exact precision, and data that must maintain accuracy through extensive calculations.
- Performance:
Calculations on
decimal
/numeric
values are generally slower compared to other numeric types, depending on their size and the complexity of the arithmetic operations involved. - Syntax:
NUMERIC(precision, scale)
: Defines a NUMERIC data type with specified precision and scale.Example:
NUMERIC(3,2)
equals to [-999.99 to 999.99]NUMERIC(precision)
: Defines a NUMERIC data type with specified precision and default scale of 0.Example:
NUMERIC(3)
equals to [-999 to 999]NUMERIC
:Any length can be stored, up to the implementation limits of PostgreSQL.
- Useful:
- Approximate Numeric:
Approximate numeric data types with precision limits.
Type Size Range real
4 bytes 1E-37 to 1E37 before .
with 6 digits afterdouble precision
8 bytes 1E-307 to 1E308 before .
with 15 digits after - Auto Increment:
Commonly used for defining auto-incrementing primary key columns.
Type Size Range smallserial
2 bytes 1 to 32767 serial
4 bytes 1 to 2147483647 bigserial
8 bytes 1 to 9223372036854775807
n
is a positive integer representing the length limit.
Type | Description |
---|---|
varchar(n) |
Variable-length character strings with a maximum length of n characters. |
char(n) |
Fixed-length character strings with a length of n characters (blank-padded). |
text |
Character strings with no specified maximum length, allowing for the storage of large amounts of text data. |
More details:
If
varchar
lacks a specifier, it accepts strings of any length (simply becomestext
).
If
char
lacks a specifier, it defaults tochar(1)
.
blank-padded: If you have a
char(10)
field and you store the string "hello" in it, since "hello" is only 5 characters long, the remaining 5 characters will be filled with blank spaces, so the stored value will be "hello " (with five additional spaces at the end). This ensures that the total length of the stored string is alwaysn
.
If your string data length exceeds the specified length
n
in avarchar
field, it will only save the firstn
characters and ignore the rest.
Type | Size | Description |
---|---|---|
boolean |
1 byte | true or false |
The input function for type boolean accepts these values:
- True:
true
,yes
,on
,1
; - False:
false
,no
,off
,0
; - Null:
null
;Indicates no value.
In SQL, constraints are rules or restrictions applied to columns in a table. They enforce data integrity by defining certain conditions that must be met for the data in the table to be valid.
Ensures that a column does not accept null values.
Syntax:
- When creating a table:
CREATE TABLE table_name ( column_1 datatype NOT NULL, ... );
- When adding to an existing table:
ALTER TABLE table_name ALTER COLUMN column_1 SET NOT NULL;
When adding
NOT NULL
by altering, the column cannot have null values.To fix:
- We need to delete each row that contains a null value for that column.
- We need to update each row to something other than null.
UPDATE table_name SET column_1 = 'some valid value' WHERE column_1 IS NULL;
Unset/Drop:
ALTER TABLE table_name
ALTER COLUMN column_1 DROP NOT NULL;
Defines the default value to use when no value is provided.
Syntax:
- When creating a table:
CREATE TABLE table_name ( column_1 datatype DEFAULT 'any value' ... );
- When adding to an existing table:
ALTER TABLE table_name ALTER COLUMN column_1 SET DEFAULT 'any value';
When
DEFAULT
is set on aNOT NULL
column and you insert a null value, the default value kicks in instead of throwing aNOT NULL
error.
Unset/Drop:
ALTER TABLE table_name
ALTER COLUMN column_1 DROP DEFAULT;
Ensures that all values in a column (or combination of columns) are unique, except for null
values.
Syntax:
- When creating a table:
- Single column uniqueness:
CREATE TABLE table_name ( column_1 datatype UNIQUE, column_2 datatype UNIQUE, ... );
- Group column uniqueness:
The combination of column values must be unique (not each column independently).
CREATE TABLE table_name ( column_1 datatype, column_2 datatype, ... UNIQUE(column_1, column_2) );
Example: (each row represents subsequent values entered)
column_1: 5
andcolumn_2: 10
column_1: 5
andcolumn_2: 11
(okay)column_1: 5
andcolumn_2: 10
(not okay, as we already have this5:10
combination)
- Single column uniqueness:
- When adding to an existing table:
- Single column uniqueness:
ALTER TABLE table_name ADD CONSTRAINT constraint_name UNIQUE (column_1);
- Group column uniqueness:
ALTER TABLE table_name ADD CONSTRAINT constraint_name UNIQUE (column_1, column_2);
When adding
UNIQUE
by altering, ensure that the column already contains unique values.To resolve duplicates:
- Delete all repeated rows for those columns.
- Update all repeated rows for those columns to make them unique.
- Single column uniqueness:
Unset/Drop:
Default constraint name format:
<table>_<column>_key
or<table>_<column_1>_<column_2>_key
(group)
ALTER TABLE table_name
DROP CONSTRAINT constraint_name;
Allows you to specify a condition to check the value before inserting or updating data, except for null values.
Syntax:
- When creating a table:
CREATE TABLE table_name ( column_1 datatype CHECK (condition) ... );
- When adding to an existing table:
ALTER TABLE table_name ADD CONSTRAINT constraint_name CHECK (condition);
When adding
CHECK
by altering, ensure that the column values already satisfy the check condition.To resolve duplicates: Update or Delete those rows.
Unset/Drop:
Default constraint name format:
<table>_<column>_check
ALTER TABLE table_name
DROP CONSTRAINT constraint_name;
A primary key indicates that a column uniquely identifies each row in a table.
- Primary keys must have unique values and cannot contain NULL values.
- A table can have only one primary key.
- Adding a primary key will automatically create an index on the column.
Syntax:
CREATE TABLE table_name (
column_1 datatype PRIMARY KEY,
...
);
Unset/Drop:
Default constraint name format:
<table>_pkey
ALTER TABLE table_name
DROP CONSTRAINT constraint_name;
A foreign key enforces referential integrity between two database tables, preventing actions that would break their relationship.
- It's a field in one table referencing the primary key in another.
- A table can have more than one foreign key constraint.
- The table with the foreign key is the child table, while the one with the primary key is the parent table.
Syntax:
- When creating a table:
CREATE TABLE child_table ( ... child_column datatype REFERENCES parent_table (parent_column) <Rule (optional)>, ... );
- When adding to an existing table:
ALTER TABLE child_table ADD CONSTRAINT constraint_name FOREIGN KEY (child_column) REFERENCES parent_table (parent_column) <Rule (optional)>;
Unset/Drop:
Default constraint name format:
<table>_<column>_fkey
ALTER TABLE table_name
DROP CONSTRAINT constraint_name;
Rules:
-
ON DELETE:
This optional component defines the action to be taken when a referenced row in the parent table is deleted.
Rule Result ON DELETE CASCADE
Deletes all dependent child rows automatically when a parent row is deleted. ON DELETE SET NULL
Sets foreign key columns in the child table to NULL
when a parent row is deleted.ON DELETE RESTRICT
Prevents parent row deletion if dependent child rows exist. ON DELETE NO ACTION
Default behavior, similar to ON DELETE RESTRICT
.Additionally, you cannot drop the parent table while you have dependent child table(s).
-
ON UPDATE:
Similar to ON DELETE rules, but this time no deletion occurs. For example, if you update a primary key value in the parent table while having
ON UPDATE CASCADE
, all corresponding foreign key values in the child table are automatically updated to match the new values.
Insertion Scenarios:
What happens when we insert into the child table.
Scenarios | Result |
---|---|
Inserting a new row with a valid foreign key value | Ok |
Inserting a new row with a NULL foreign key |
Ok (optional relationship) |
Inserting a new row with a non-existent foreign key value | Error |
The INSERT
statement is used to add new rows (data) into a table.
Syntax:
INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);
The
column
-value
order is important.
Some other variants:
-
If you're inserting data into all columns and the values you're inserting correspond to the columns in the same order, you don't need to explicitly specify the column names:
INSERT INTO table_name VALUES (value1, value2, ...);
-
If you want to insert multiple rows in a single statement:
INSERT INTO table_name (column1, column2, ...) VALUES (value1_1, value1_2, ...), (value2_1, value2_2, ...), ... (valueN_1, valueN_2, ...);
The SELECT
statement is used to retrieve data from one or more tables in a database.
-
Select all columns from a table:
Note: Using
*
to select all columns is generally considered bad practice. It's recommended to explicitly specify the columns you need whenever possible.SELECT * FROM table_name;
-
Select specific columns from a table:
SELECT column1, column2 FROM table_name;
-
Select unique values from a specific column in a table:
SELECT DISTINCT column1 FROM table_name;
Extra:
-
You can use the
RETURNING
keyword to return affected rows from any of theINSERT
,UPDATE
, orDELETE
statements.-
Examples:
UPDATE employees SET age = 99 WHERE id = 2 RETURNING *;
DELETE FROM employees WHERE id = 4 RETURNING name, salary;
-
The WHERE
clause is used to filter rows from a table based on specified conditions.
It allows you to selectively retrieve, update, or delete rows that meet certain criteria.
You can filter rows based on various conditions such as equality, comparison operators, pattern matching, and logical operators.
Syntax:
SELECT * FROM table_name
WHERE <condition>;
Example:
SELECT *
FROM employees
WHERE department = 'Sales';
The UPDATE
statement is used to modify existing records (data) in a table.
Syntax:
UPDATE table_name
SET column1 = new_value1, column2 = new_value2, ...
WHERE <condition>;
This statement updates the values of specified columns in existing rows that meet the specified condition.
Warning
If you omit the condition, it will update all rows.
The DELETE
statement is used to remove existing records (data) from a table.
Syntax:
DELETE FROM table_name
WHERE <condition>;
This statement removes rows from the specified table that meet the specified condition.
Warning
If you omit the WHERE
clause, it will delete all rows from the table.
However, if you still want to delete all rows from a table, it is not efficient to use a WHERE
clause; it is better to use TRUNCATE
instead.
Deleting rows may also trigger cascading deletes if foreign key constraints are set up with cascading delete actions.
-
DELETE USING
:Allows you to join multiple tables in a delete operation, making it possible to delete rows from one table based on a condition involving another related table.
Example:
DELETE FROM orders USING customers WHERE orders.customer_id = customers.customer_id AND customers.country = 'USA';
Deletes all data from tables.
It is faster than
DELETE
because it doesn't generate individual delete statements for each row.
Syntax:
TRUNCATE TABLE table_name1, table_name2, ... <Options>
Options:
- (default)
CONTINUE IDENTITY
: Do not change the values of sequences. RESTART IDENTITY
: Automatically restart sequences owned by columns.- (default)
RESTRICT
: Refuse to truncate if any of the (parent) tables have foreign-key references from (child) tables that are not listed in the command. CASCADE
: Truncate all (parent) tables that have foreign-key references also truncate all dependent child tables.
TRUNCATE
is transaction-safe in PostgreSQL. Can safely roll back if the surrounding transaction does not commit.
Sorts the rows returned by a query based on one or more columns.
SELECT * FROM employees ORDER BY salary ASC;
ASC
: for ascending.DESC
: for descending.
Limits the number of rows returned by a query.
This query will retrieve the top 3 highest-paid employees:
SELECT * FROM employees ORDER BY salary DESC LIMIT 3;To select one random row from a table:
SELECT * FROM employees ORDER BY RANDOM() LIMIT 1;
Specifies how many rows to skip from the beginning of the result set before starting to return rows.
This query will skip the youngest 3 employees:
SELECT * FROM employees ORDER BY age OFFSET 3;
Important
It's important to use ORDER BY
when using LIMIT
and OFFSET
to ensure consistent results, as the database engine may not guarantee a specific order otherwise.
Operators and functions are used with the SELECT
statement and the WHERE
clause to filter and manipulate the data.
-
Logical Operators:
Operator Description AND
Returns true
if both conditions separated byAND
are true. Otherwise, it returnsfalse
.OR
Returns true
if at least one of the conditions separated byOR
is true. Otherwise, it returnsfalse
.NOT
Returns true
if the following condition is false, and returnsfalse
if the following condition is true. It negates the result of the condition.
-
Mathematical Operators:
Operator Description Example +
Addition. 2 + 3 → 5 -
Subtraction. 2 - 3 → -1 *
Multiplication. 2 * 3 → 6 /
Division. 10 / 2 → 5 %
Remainder. 5 % 4 → 1 ^
Exponentiation. 2 ^ 3 → 8 |/
Square root. |/ 25 → 5 ||/
Cube root. ||/ 64 → 4 @
Absolute value. @ -5 → 5 SELECT title, price * units_sold AS revenue FROM books;
-
Comparison Operators:
Operator Description =
Equal. !=
Not equal. <>
Not equal. <
Less than. <=
Less than or equal to. >
Greater than. >=
Greater than or equal to. SELECT name, salary FROM employees WHERE age >= 30;
-
Comparison Predicates: Docs ↗
BETWEEN
: Simplifies range tests.a BETWEEN x AND y
is equivalent toa >= x AND a <= y
SELECT * FROM employees WHERE salary BETWEEN 1100 AND 1700;
IS
: Used for making specific comparisons.IS NULL
,IS NOT NULL
,IS TRUE
,IS FALSE
,IS NOT TRUE
,IS NOT FALSE
;SELECT * FROM employees WHERE bonus_salary IS NULL;
-
Array Comparisons: Docs ↗
IN
: Checks if a value matches any value in a specified list or subquery.expression IN (value1, value2, ...)
is equivalent toexpression = value1 OR expression = value2 OR ...
SELECT * FROM employees WHERE name IN ('John', 'Jane', ...);
ANY
/SOME
: Compares a single value to any value in an array, returningtrue
if the value matches any element in the array.Syntax:
expression operator SOME (array expression)
SELECT * FROM employees WHERE 350 > ANY (salary_history);
ALL
: Compares a single value to all values in an array, returningtrue
if the value matches every element in the array.SELECT * FROM employees WHERE 500 < ALL (salary_history);
They all return a Boolean (
true
/false
) value. -
-
LIKE
(pattern matching):Pattern matching allows you to search for patterns in strings, similar to regular expressions but with a simpler syntax.
Syntax:
SELECT * FROM table_name WHERE column_name LIKE '<pattern>';
-
Pattern:
%
: Matches any sequence of characters, including none._
: Matches any single character.
Examples:
'abc%'
: Any string that starts withabc
.'%xyz'
: Any string that ends withxyz
'_a%'
: Any string where the second character isa
.
-
- Mathematical Functions. Docs ↗
- String Functions. Docs ↗
- Conditional Expressions. Docs ↗
CASE
: Allows you to perform conditional logic within SQL queries.It evaluates a list of conditions and returns a result based on the first condition that is
true
.SELECT name, age, salary, CASE lvl WHEN 'A' THEN 'Manager' WHEN 'B' THEN 'Senior Dev' WHEN 'C' THEN 'Dev' ELSE 'Intern' END AS seniority FROM employees;
or
SELECT name, age, salary, CASE WHEN salary >= 5000 THEN 'Rich' WHEN salary >= 2500 THEN 'Average' ELSE 'Poor' END AS wealth FROM employees;
COALESCE
: Returns the first non-null expression from a list of expressions.It is often used to provide a default value when the input value is
null
.SELECT COALESCE(bonus_salary, 0) AS bonus_salary FROM employees;
GREATEST
: Takes multiple input values and returns the largest one among them.SELECT GREATEST(23, 4, 128, 2000) AS max_value;
LEAST
: Takes multiple input values and returns the smallest one among them.SELECT LEAST(23, 4, 128, 2000) AS min_value;
The GROUP BY
clause is used to group rows that have the same values into summary rows.
- You can only select columns that are being grouped.
SELECT age FROM employees GROUP BY age;
- To select columns that are not being grouped, you have to use aggregate functions.
Aggregate functions are functions that perform a calculation on a set of values and return a single value.
These functions are typically used with the
GROUP BY
clause to perform operations across multiple rows and generate summary results.
Function | Description |
---|---|
COUNT(column_name) |
Counts the number of rows in a group. |
SUM(column_name) |
Calculates the sum of values in a group. |
AVG(column_name) |
Computes the average of values in a group. |
MIN(column_name) |
Finds the minimum value in a group. |
MAX(column_name) |
Finds the maximum value in a group. |
Examples:
-
Selects the maximum salary for each distinct age:
SELECT MAX(salary), age FROM employees GROUP BY age;
Note
We can also call the functions directly. For example, to select the overall maximum salary:
SELECT MAX(salary) FROM employees;
However, we still can't select other ungrouped columns because when using aggregate functions, SQL requires that all selected columns either be included in the GROUP BY
clause or be part of an aggregate function.
-
Counts the total number of rows in the "employees" table:
SELECT COUNT(*) FROM employees;
We are using
*
here because if we have chosen a specific column and it contains aNULL
value, the aggregation function will ignore it, potentially resulting in an incorrect count. -
To see how many employees share the same age:
SELECT COUNT(*), age FROM employees GROUP BY age;
The HAVING
clause is used to filter the results of a GROUP BY clause based on specified conditions.
While the
WHERE
clause filters individual rows before they are grouped, theHAVING
clause filters group rows after they have been formed by theGROUP BY
operation.
Examples:
-
This query selects the departments where the average salary is greater than $2,500
SELECT department, AVG(salary) as avg_salary FROM employees GROUP BY department HAVING AVG(salary) > 2500;
-
This query first filters the employees who were hired after
January 1, 2023
, and then selects the departments where the average salary of these employees is greater than $2,500.SELECT department, AVG(salary) as avg_salary FROM employees WHERE hire_date >= '2023-01-01' GROUP BY department HAVING AVG(salary) > 2500;
-
This query selects the departments with fewer than 5 employees.
SELECT department, COUNT(*) as num_employees FROM employees GROUP BY department HAVING COUNT(*) < 5;
Joins in SQL are operations used to combine rows from two or more tables based on a related column.
Syntax:
SELECT *
FROM table1
JOIN table2 ON table1.column_name = table2.column_name;
-
If you specifically want to select column names but they are colliding, use Table Aliases to differentiate them.
SELECT url, photos.id, employees.id FROM photos JOIN employees ON photos.employee_id = employees.id;
To prettify the output:
SELECT url, photos.id AS photos_id, employees.id AS employees_id FROM photos JOIN employees ON photos.employee_id = employees.id;
-
If you don't specify a join type explicitly, the default join type is an
INNER JOIN
.
Returns rows when there is at least one match in both tables.
SELECT * FROM photos INNER JOIN employees ON photos.employee_id = employees.id;
Returns all rows from the left table (table1), and the matched rows from the right table (table2).
If there is no match, the result is
NULL
on the right side.
SELECT * FROM photos LEFT JOIN employees ON photos.employee_id = employees.id;
Returns all rows from the right table (table2), and the matched rows from the left table (table1).
If there is no match, the result is NULL on the left side.
SELECT * FROM photos RIGHT JOIN employees ON photos.employee_id = employees.id;
Returns all rows when there is a match in either left or right table.
The result is NULL on the side where there is no match.
SELECT * FROM photos FULL JOIN employees ON photos.employee_id = employees.id;
Self-join is a special type of join operation where a table is joined with itself.
This can be useful when you have a table with hierarchical data or when you need to compare rows within the same table.
Example:
-
To find all dates'
id
with higher temperatures compared to its previous dates (yesterday). Leetcode ↗+---------------+---------+ | Column Name | Type | +---------------+---------+ | id | int | | recordDate | date | | temperature | int | +---------------+---------+
SELECT w1.id FROM Weather AS w1, Weather AS w2 WHERE w1.Temperature > w2.Temperature AND w1.recordDate - w2.recordDate = 1
Indexes in a database are special lookup tables that the database search engine can use to speed up data retrieval. They allow the database to find rows faster without scanning the entire table sequentially.
Index types:
Index types determine how data is stored and organized internally, which directly impacts performance and suitability for specific operations.
- B-Tree Index: (this is the default when no index type is specified)
It is a self-balancing tree that stores data in sorted order, making searches efficient
O(log n)
. The structure of this index is ideal for comparison-based queries, such as range queries and operations involving=
,<
,>
,<=
, or>=
. - Hash Index:
A hash index uses a hash table to store indexed values. It maps each value to a fixed-size hash code, enabling fast lookups for equality comparisons (
=
).
There are more: https://www.postgresql.org/docs/current/indexes-types.html#INDEXES-TYPES.
Good to know:
- Indexes take extra space.
The database engine requires additional storage for the data structure that holds the indexed values.
- Indexes can slow down write, update, and delete operations.
With every write, update, or delete action, the index must be updated to maintain its optimized structure.
- Some constraints automatically create indexes:
Constraints like
PRIMARY KEY
andUNIQUE
automatically generate indexes to enforce their rules efficiently. So, no need to create indexes for the columns with these constraints.
Commands:
- Create an index:
- With specifying an index name:
CREATE INDEX index_name ON table_name (column_name);
To define the index type, use
USING <index_type>
syntax. Ex:CREATE INDEX ON table_name USING HASH (column_name);
- Without specifying an index name:
This will create an index with default naming convention. Ex:
<table_name>_<column_name>_idx
.CREATE INDEX ON table_name (column_name);
- Partial index:
A partial index is an index that only includes a subset of the rows in a table, based on a condition (a
WHERE
clause) which can make the index smaller and more efficient (does not need to optimize the index in all cases).A partial index is useful when you frequently query a specific subset of data, such as active records or a particular date range, and indexing the entire table would be inefficient.
CREATE INDEX ON table_name (column_name) WHERE <condition>
- Multicolumn index:
Index name will be something like:
<table_name>_<column_1>_<column_2>_idx
.CREATE INDEX ON table_name (column_1, column_2, ...);
Multi column indexes are useful when your queries use multiple columns together in conditions.
For example:
SELECT * FROM users WHERE name = 'John' AND age = 30;
Note: A multi-column index like (
column_1
,column_2
, ...) prioritizes the leftmost column (column_1
) first in its structure. So, it won't help with queries that filter only bycolumn_2
unlesscolumn_1
is also included in the filter. It is because the rows are first sorted bycolumn_1
then sorted bycolumn_2
. Using onlycolumn_2
requires to scan the entire index.
- With specifying an index name:
- Remove an index:
DROP INDEX index_name;
- List all indexes:
SELECT indexname, indexdef FROM pg_indexes WHERE tablename = '<table_name>';
- The official PostgreSQL documentation: https://www.postgresql.org/docs/current/sql.html
- Step by step PostgreSQL tutorial: https://www.postgresqltutorial.com
- Database and SQL Roadmap (all in one): https://www.databasestar.com/sql-roadmap/