Skip to content

Latest commit

 

History

History
424 lines (270 loc) · 16.8 KB

sql.md

File metadata and controls

424 lines (270 loc) · 16.8 KB

SQL Reference

Data Types

The following data types are supported:

  • BOOLEAN (BOOL): logical truth values, i.e. true and false.
  • FLOAT (DOUBLE): 64-bit signed floating point numbers, using IEEE 754 binary64 encoding. Supports magnitudes of 10⁻³⁰⁷ to 10³⁰⁸ with 53-bit precision (~15 significant figures), as well as the special values infinity and NaN.
  • INTEGER (INT): 64-bit signed integer numbers with a range of ±2⁶³-1.
  • STRING (TEXT, VARCHAR): UTF-8 encoded strings.

In addition, the special NULL value is used for an unknown value, following the rules of three-valued logic.

Numeric types are not interchangable; a float value (even without a fractional part) cannot be stored in an integer column and vice-versa.

SQL Syntax

Keywords

Keywords are reserved words with special meaning in SQL statements. They are case-insensitive, and must be quoted with " to be used as identifiers. The complete list is:

AS, ASC, AND, BEGIN, BOOL, BOOLEAN, BY, COMMIT, CREATE, CROSS, DEFAULT,DELETE, DESC, DOUBLE, DROP, EXISTS, EXPLAIN, FALSE, FLOAT, FROM, GROUP, HAVING, IF, INDEX, INFINITY, INNER, INSERT, INT, INTEGER, INTO, IS, JOIN, KEY, LEFT, LIKE, LIMIT, NAN, NOT, NULL, OF, OFFSET, ON, ONLY, OR, ORDER, OUTER, PRIMARY, READ, REFERENCES, RIGHT, ROLLBACK, SELECT, SET, STRING, SYSTEM, TABLE, TEXT, TIME, TRANSACTION, TRUE, UNIQUE, UPDATE, VALUES, VARCHAR, WHERE, WRITE

Identifiers

Identifiers are names for database objects such as tables and columns. Unless quoted with ", they must begin with a Unicode letter followed by any combination of letters, numbers, and _, and cannot be reserved keywords. "" can be used to escape a double quote character. They are always converted to lowercase.

Constants

Named constants

The following keywords evaluate to constants:

  • FALSE: the boolean false value.
  • INFINITY: the floating-point value for infinity.
  • NAN: the floating-point value for NaN (not a number).
  • NULL: an unknown value.
  • TRUE: the boolean true value.

String literals

String literals are surrounded by single quotes ', and can contain any valid UTF-8 character. Single quotes must be escaped by an additional single quote, i.e. '', no other escape sequences are supported. For example:

'A string with ''quotes'' and emojis 😀'

Numeric literals

Sequences of digits 0-9 are parsed as a 64-bit signed integer. Numbers with decimal points or in scientific notation are parsed as 64-bit floating point numbers. The following pattern is supported:

999[.[999]][e[+-]999]

The - prefix operator can be used to take negative numbers.

Expressions

Expressions can be used wherever a value is expected, e.g. as SELECT columns nd INSERT values. They are made up of constants, a column references, an operator invocations, and a function calls.

Column references can either be unqualified, e.g. name, or prefixed with the relation identifier separated by ., e.g. person.name. Unqualified identifiers must be unambiguous.

SQL Operators

Logical operators

Logical operators apply standard logic operations on boolean operands.

  • AND: the logical conjunction, e.g. TRUE AND TRUE yields TRUE.
  • OR: the logical disjunction, e.g. TRUE OR FALSE yields TRUE.
  • NOT: the logical negation, e.g. NOT TRUE yields FALSE.

The complete truth tables are:

AND TRUE FALSE NULL
TRUE TRUE FALSE NULL
FALSE FALSE FALSE FALSE
NULL NULL FALSE NULL
OR TRUE FALSE NULL
TRUE TRUE TRUE TRUE
FALSE TRUE FALSE NULL
NULL TRUE NULL NULL
NOT
TRUE FALSE
FALSE TRUE
NULL NULL

Comparison operators

Comparison operators compare values of the same data type, and return TRUE if the comparison holds or FALSE otherwise. INTEGER and FLOAT values are interchangeable. STRING comparisons use the string's byte values, i.e. case-sensitive with 'B' < 'a' due to their UTF-8 code points. FALSE is considered lesser than TRUE. Comparison with NULL always yields NULL (even NULL = NULL).

Binary operators:

  • =: equality, e.g. 1 = 1 yields TRUE.
  • !=: inequality, e.g. 1 != 2 yields TRUE.
  • >: greater than, e.g. 2 > 1 yields TRUE.
  • >=: greater than or equal, e.g. 1 >= 1 yields TRUE.
  • <: lesser than, e.g. 1 < 2 yields TRUE.
  • <=: lesser than or equal, e.g. 1 <= 1 yields TRUE.

Unary operators:

  • IS NULL: checks if the value is NULL, e.g. NULL IS NULL yields TRUE.
  • IS NOT NULL: checks if the value is not NULL, e.g. TRUE IS NOT NULL yields TRUE.
  • IS NAN: checks if the value is a float NAN, e.g. NAN IS NAN yields TRUE. Errors on non-float datatypes, except NULL which yields NULL.
  • IS NOT NAN: checks if the value is not a float NAN, e.g. 3.14 IS NOT NAN yields TRUE.

Mathematical operators

Mathematical operators apply standard math operations on numeric (INTEGER or FLOAT) operands. If either operand is a FLOAT, both operands are converted to FLOAT and the result is a FLOAT. If either operand is NULL, the result is NULL. The special values INFINITY and NAN are handled according to the IEEE 754 spec.

For INTEGER operands, failure conditions such as overflow and division by zero yield an error. For FLOAT operands, these return INFINITY or NAN as appropriate.

Binary operators:

  • +: addition, e.g. 1 + 2 yields 3.
  • -: subtraction, e.g. 3 - 2 yields 1.
  • *: multiplication, e.g. 3 * 2 yields 6.
  • /: division, e.g. 6 / 2 yields 3.
  • ^: exponentiation, e.g. 2 ^ 4 yields 16.
  • %: remainder, e.g. 8 % 3 yields 2. Unlike modulo, the result has the sign of the dividend.

Unary operators:

  • + (prefix): identity, e.g. +1 yields 1.
  • - (prefix): negation, e.g. - -2 yields 2.
  • ! (postfix): factorial, e.g. 5! yields 15.

String operators

String operators operate on string operands.

  • LIKE: compares a string with the given pattern, using % as multi-character wildcard and _ as single-character wildcard, returning TRUE if the string matches the pattern - e.g. 'abc' LIKE 'a%' yields TRUE.

Operator precedence

The operator precedence (order of operations) is as follows:

Precedence Operator Associativity
10 +, - (prefix) Right
9 ! (postfix) Left
8 ^ Right
7 *, /, % Left
6 +, - Left
5 >, >=, <, <= Left
4 =, !=, LIKE, IS Left
3 NOT Right
2 AND Left
1 OR Left

Precedence can be overridden by wrapping an expression in parentheses, e.g. (1 + 2) * 3.

Functions

  • sqrt(expr): returns the square root of a numerical argument.

Aggregate functions

Aggregate function aggregate an expression across all rows, optionally grouped into buckets given by GROUP BY, and results can be filtered via HAVING.

  • AVG(expr): returns the average of numerical values.

  • COUNT(expr): returns the number of rows for which expr evaluates to a non-NULL value. COUNT(*) can be used to count all rows.

  • MAX(expr): returns the maximum value, according to the datatype's ordering.

  • MIN(expr): returns the minimum value, according to the datatype's ordering.

  • SUM(expr): returns the sum of numerical values.

SQL Statements

BEGIN

Starts a new transaction.

BEGIN [ TRANSACTION ] [ READ ONLY | READ WRITE ] [ AS OF SYSTEM TIME txn_id ]
  • txn_id: A past transaction ID to run a read-only transaction for, for time-travel queries.

COMMIT

Commits an active transaction.

CREATE TABLE

Creates a new table.

CREATE TABLE table_name (
    [ column_name data_type [ column_constraint [ ... ] ]  [ INDEX ] [, ... ] ]
)

where column_constraint is:

{ NOT NULL | NULL | PRIMARY KEY | DEFAULT expr | REFERENCES ref_table | UNIQUE }
  • table_name: The name of the table. Must be a valid identifier. Errors if a table with this name already exists.

  • column_name: The name of the column. Must be a valid identifier, and unique within the table.

  • data_type: The data type of the column, see data types for valid types.

  • NOT NULL: The column may not contain NULL values.

  • NULL: The column may contain NULL values. This is the default.

  • PRIMARY KEY: The column should act as a primary key, i.e. the main row identifier. A table must have exactly one primary key column, and it must be unique and non-nullable.

  • DEFAULTexpr: Specifies a default value for the column when INSERT statements do not give a value. expr can be any constant expression of an appropriate data type, e.g. 'abc' or 1 + 2 * 3. For nullable columns, the default value is NULL unless specified otherwise.

  • REFERENCESref_table: The column is a foreign key to ref_table's primary key, enforcing referential integrity.

  • UNIQUE: The column may only contain unique (distinct) values. NULL values are not considered equal, thus a UNIQUE column which allows NULL may contain multiple NULL values. PRIMARY KEY columns are implicitly UNIQUE.

  • INDEX: Create an index for the column.

Example

CREATE TABLE movie (
    id INTEGER PRIMARY KEY,
    title STRING NOT NULL,
    release_year INTEGER INDEX,
    imdb_id STRING INDEX UNIQUE,
    bluray BOOLEAN NOT NULL DEFAULT TRUE
)

DELETE

Deletes rows in a table.

DELETE FROM table_name
    [ WHERE predicate ]

Deletes rows where predicate evaluates to TRUE, or all rows if no WHERE clause is given.

  • table_name: the table to delete from. Errors if it does not exist.

  • predicate: an expression which determines which rows to delete by evaluting to TRUE. Must evaluate to a BOOLEAN or NULL, otherwise an error is returned.

Example

DELETE FROM movie
WHERE release_year < 2000 AND bluray = FALSE

DROP TABLE

Deletes a table and all contained data. Errors if the table does not exist, unless IF EXISTS is given.

DROP TABLE [ IF EXISTS ] table_name
  • table_name: the table to delete.

EXPLAIN

Outputs the execution plan for the given statement.

EXPLAIN [ statement ]

INSERT

Inserts rows into a table.

INSERT INTO table_name
    [ ( column_name [, ... ] ) ]
    VALUES ( expression [, ... ] ) [, ... ]

If column names are given, an identical number of values must be given. If no column names are given, values must be given in the table's column order. Omitted columns will get a default value if specified, otherwise an error will be returned.

  • table_name: the table to insert into. Errors if it does not exist.

  • column_name: a column to insert into in the given table. Errors if it does not exist.

  • expression: an expression to insert into the corresponding column. Must be a constant expression, i.e. it cannot refer to table columns.

Example

INSERT INTO movie
    (id, title, release_year)
VALUES
    (1, 'Sicario', 2015),
    (2, 'Stalker', 1979),
    (3, 'Her', 2013)

ROLLBACK

Rolls back an active transaction.

SELECT

Selects rows from a table.

SELECT [ * | expression [ [ AS ] output_name [, ...] ] ]
    [ FROM from_item [, ...] ]
    [ WHERE predicate ]
    [ GROUP BY group_expr [, ...] ]
    [ HAVING having_expr ]
    [ ORDER BY order_expr [ ASC | DESC ] [, ...] ]
    [ LIMIT count ]
    [ OFFSET start ]

where from_item is one of:

table_name [ [ AS ] alias ]
from_item join_type from_item [ ON join_predicate ]

where join_type is one of:

CROSS JOIN
[ INNER ] JOIN
LEFT [ OUTER ] JOIN
RIGHT [ OUTER ] JOIN

Fetches rows or expressions, either from table table_name (if given) or generated.

  • expression: expression to fetch (can be a simple column name).

  • output_name: output column identifier, defaults to column name (if single column) otherwise nothing (displayed as ?).

  • table_name: table to fetch rows from.

  • alias: table alias.

  • predicate: only return rows for which this expression evaluates to TRUE.

  • group_expr: an expression to group aggregates by. Non-aggregate SELECT expressions must either reference a column given in group_expr, be idential with a group_expr, or have an output_name that is referenced by a group_expr column.

  • having_expr: only return aggregate results for which this expression evaluates to TRUE.

  • order_expr: order rows by this expression (can be a simple column name).

  • count: maximum number of rows to return. Must be a constant integer expression.

  • start: number of rows to skip. Must be a constant integer expression.

  • join_predicate: only return rows for which this expression evaluates to TRUE.

Join types:

  • CROSS JOIN: returns the Carthesian product of the joined tables. Does not accept a join predicate (ON clause).

  • INNER JOIN: returns the rows of the tables' Carthesian product for which join_predicate evaluates to TRUE.

  • LEFT OUTER JOIN: returns the rows joined on the join_predicate, or for any rows in the left table that does not have a match in the right table a single row is returned with the right table's columns set to NULL.

  • RIGHT OUTER JOIN: the same as a LEFT OUTER JOIN but with the left and right tables switched.

Example

SELECT id, title, 2020 - released AS age
FROM movies
WHERE released >= 2000 AND ultrahd
ORDER BY released DESC, title ASC
LIMIT 10
OFFSET 10

UPDATE

Updates rows in a table.

UPDATE table_name
    SET column_name = expression | DEFAULT [, ... ]
    [ WHERE predicate ]

Updates columns given by column_name to the corresponding expression for all rows where predicate evaluates to TRUE. If no WHERE clause is given, all rows are updated.

  • table_name: the table to update. Errors if it does not exist.

  • column_name: a column to update. Errors if it does not exist.

  • expression: an expression whose evaluated value will be set for the corresponding column and row. Expressions can refer to column values, and must evaluate to the same datatype as the updated column. Using DEFAULT will set the column's default value, if any.

  • predicate: an expression which determines which rows to update by evaluting to TRUE. Must evaluate to a BOOLEAN or NULL, otherwise an error is returned.

Example

UPDATE movie
SET bluray = TRUE
WHERE release_year >= 2000 AND bluray = FALSE

Transactions

toyDB supports ACID transactions using MVCC-based snapshot isolation, protecting from the following anomalies: dirty writes, dirty reads, lost updates, fuzzy reads, read skew, and phantom reads. However, write skew anomalies are possible since serializable snapshot isolation is not implemented.

A new transaction is started with BEGIN, and ended with either COMMIT (atomically writing all changes) or ROLLBACK (discarding all changes). If any conflicts occur between concurrent transactions, the lowest transaction ID wins and the others will fail with a serialization error and must retry.

All past data is versioned and retained, and can be queried as of a given transaction ID via BEGIN TRANSACTION READ ONLY AS OF SYSTEM TIME <txn_id>.

A transaction is still valid for use if a contained statement returns an error. It is up to the client to take appropriate action.