From b2f14c8e4de4a48abaeecc30b452e2aa3683da53 Mon Sep 17 00:00:00 2001 From: David Wicinas <93669463+dwicinas@users.noreply.github.com> Date: Mon, 18 Sep 2023 11:47:04 -0400 Subject: [PATCH] draft of revised Character types topic --- .../02_data_types/02_character_types.mdx | 37 +++---------------- 1 file changed, 6 insertions(+), 31 deletions(-) diff --git a/product_docs/docs/epas/15/reference/sql_reference/02_data_types/02_character_types.mdx b/product_docs/docs/epas/15/reference/sql_reference/02_data_types/02_character_types.mdx index f856dc9f212..40b7176c389 100644 --- a/product_docs/docs/epas/15/reference/sql_reference/02_data_types/02_character_types.mdx +++ b/product_docs/docs/epas/15/reference/sql_reference/02_data_types/02_character_types.mdx @@ -25,29 +25,28 @@ source: | `VARCHAR2[(n)]` | | ✅ | Alias for `CHARACTER VARYING` | - ## Overview -SQL defines two primary character types: `CHARACTER VARYING(n)` and `CHARACTER(n)`, where `n` is a positive integer. These types can store strings up to `n` characters in length. An attempt to assign a value that exceeds the length of `n` results in an error, unless the excess characters are all spaces. In this case, the string is truncated to the maximum length. If the string to be stored is shorter than the declared length, values of type `CHARACTER` will be space-padded; values of type `CHARACTER VARYING` will simply store the shorter string. +SQL defines two primary character types: `CHARACTER VARYING(n)` and `CHARACTER(n)`, where `n` is a positive integer. These types can store strings up to `n` characters in length. If you don't specify a value for `n`, `n` defaults to `1`. Assigning a value that exceeds the length of `n` results in an error unless the excess characters are all spaces. In this case, the string is truncated to the maximum length. If the string to be stored is shorter than `n`, values of type `CHARACTER` are space-padded to the specified width (`n`) and are stored and displayed that way; values of type `CHARACTER VARYING` simply store the shorter string. -If one explicitly casts a value to character varying(n) or character(n), then an over-length value will be truncated to n characters without raising an error. +If you explicitly cast a value to `CHARACTER VARYING(n)` or `CHARACTER(n)`, an over-length value is truncated to n characters without raising an error. -The notations `VARCHAR(n)` and `CHAR(n)` are aliases for `CHARACTER VARYING(n)` and `CHARACTER(n)`, respectively. If specified, the length must be greater than zero and cannot exceed 10485760. `CHARACTER` without length specifier is equivalent to `CHARACTER(1)`. If `CHARACTER VARYING` is used without a length specifier, the type accepts strings of any size. The latter is a PostgreSQL extension. +The notations `VARCHAR(n)` and `CHAR(n)` are aliases for `CHARACTER VARYING(n)` and `CHARACTER(n)`, respectively. If specified, the length must be greater than zero and cannot exceed 10485760. `CHARACTER` without a length specifier is equivalent to `CHARACTER(1)`. If `CHARACTER VARYING` is used without a length specifier, the type accepts strings of any size. The latter is a PostgreSQL extension. In addition, PostgreSQL provides the `TEXT` type, which stores strings of any length. Although the type T$XT is not in the SQL standard, several other SQL database management systems have it as well. -Values of type `CHARACTER` are physically padded with spaces to the specified width n, and are stored and displayed that way. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type `CHARACTER`. In collations where whitespace is significant, this behavior can produce unexpected results; for example `SELECT 'a '::CHAR(2) collate "C" < E'a\n'::CHAR(2)` returns true, even though C locale would consider a space to be greater than a newline. Trailing spaces are removed when converting a CHARACTER value to one of the other string types. Note that trailing spaces are semantically significant in `CHARACTER VARYING` and `TEXT` values, and when using pattern matching, that is `LIKE` and regular expressions. +Values of type `CHARACTER` are physically padded with spaces to the specified width n, and are stored and displayed that way. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type `CHARACTER`. In collations where whitespace is significant, this behavior can produce unexpected results; for example `SELECT 'a '::CHAR(2) collate "C" < E'a\n'::CHAR(2)` returns true, even though C locale would consider a space to be greater than a newline. Trailing spaces are removed when converting a `CHARACTER` value to one of the other string types. Note that trailing spaces are semantically significant in `CHARACTER VARYING` and `TEXT` values, and when using pattern matching, that is `LIKE` and regular expressions. The characters that can be stored in any of these data types are determined by the database character set, which is selected when the database is created. Regardless of the specific character set, the character with code zero (sometimes called `NUL`) can't be stored. For more information refer to [Character Set Support](https://www.postgresql.org/docs/current/multibyte.html). -The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of `CHARACTER`. Longer strings have 4 bytes of overhead instead of 1. Long strings are compressed by the system automatically, so the physical requirement on disk might be less. Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values. In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than that. It wouldn't be useful to change this because with multibyte character encodings the number of characters and bytes can be quite different. If you desire to store long strings with no specific upper limit, use `TEXT` or `CHARACTER VARYING` without a length specifier, rather than making up an arbitrary length limit.) +The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of `CHARACTER`. Longer strings have 4 bytes of overhead instead of 1. Long strings are compressed by the system automatically, so the physical requirement on disk might be less. Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values. In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that is allowed for n in the data type declaration is less than that. It wouldn't be useful to change this because with multibyte character encodings the number of characters and bytes can be quite different. If you desire to store long strings with no specific upper limit, use `TEXT` or `CHARACTER VARYING` without a length specifier, rather than making up an arbitrary length limit.) The database character set determines the character set used to store textual values. !!!Tip There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While `CHARACTER(n)` has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact CHARACTER(n) is usually the slowest of the three because of its additional storage costs. In most situations `TEXT` or `CHARACTER VARYING` should be used instead. -Refer to Section [String Constants](https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS) for information about the syntax of string literals, and [Functions and Operators](https://www.postgresql.org/docs/current/functions.html) for information about available operators and functions. +Refer to the [Postgres documentation on string constances](https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS) for information about the syntax of string literals and [functions and operators](https://www.postgresql.org/docs/current/functions.html) for information about available operators and functions. !!! Example: Using the character types @@ -86,30 +85,6 @@ Table 8.5. Special Character Types | "CHAR" | 1 byte | single-byte internal type | | `NAME` | 64 bytes | internal type for object names | -========================================= - -`CHAR` - - If you don't specify a value for `n`, `n` defaults to `1`. If the string to assign is shorter than `n`, values of type `CHAR` are space-padded to the specified width (`n`) and are stored and displayed that way. - - Padding spaces are treated as semantically insignificant. That is, trailing spaces are disregarded when comparing two values of type `CHAR`, and they are removed when converting a `CHAR` value to one of the other string types. - - If you explicitly cast an over-length value to a `CHAR(n)` type, the value is truncated to `n` characters without raising an error (as specified by the SQL standard). - -```sql -VARCHAR, VARCHAR2, NVARCHAR and NVARCHAR2 -``` - - If the string to assign is shorter than `n`, values of type `VARCHAR`, `VARCHAR2`, `NVARCHAR`, and `NVARCHAR2` store the shorter string without padding. - -!!! Note -The trailing spaces are semantically significant in `VARCHAR` values. -!!! - - If you explicitly cast a value to a `VARCHAR` type, an over-length value is truncated to `n` characters without raising an error (as specified by the SQL standard). - -`CLOB` - You can store a large character string in a `CLOB` type. `CLOB` is semantically equivalent to `VARCHAR2` except no length limit is specified. Generally, use a `CLOB` type if you don't know the maximum string length. The longest possible character string that you can store in a `CLOB` type is about 1 GB.