diff --git a/product_docs/docs/pgd/5/appusage.mdx b/product_docs/docs/pgd/5/appusage.mdx deleted file mode 100644 index 7c54778a8cb..00000000000 --- a/product_docs/docs/pgd/5/appusage.mdx +++ /dev/null @@ -1,349 +0,0 @@ ---- -title: Application use -redirects: - - ../bdr/appusage ---- - -Learn about the PGD application from a user perspective. - -## Application behavior - -PGD supports replicating changes made on one node to other nodes. - -PGD, by default, replicates all changes from INSERT, UPDATE, DELETE -and TRUNCATE operations from the source node to other nodes. Only the final changes -are sent, after all triggers and rules are processed. For example, -`INSERT ... ON CONFLICT UPDATE` sends either an insert or an update, -depending on what occurred on the origin. If an update or delete affects -zero rows, then no changes are sent. - -You can replicate INSERT without any preconditions. - -For updates and deletes to replicate on other nodes, PGD must be able to -identify the unique rows affected. PGD requires that a table have either -a PRIMARY KEY defined, a UNIQUE constraint, or an explicit REPLICA IDENTITY -defined on specific columns. If one of those isn't defined, a warning is -generated, and later updates or deletes are explicitly blocked. -If REPLICA IDENTITY FULL is defined for a table, then a unique index isn't required. -In that case, updates and deletes are allowed and use the first non-unique -index that's live, valid, not deferred, and doesn't have expressions or WHERE -clauses. Otherwise, a sequential scan is used. - -You can use TRUNCATE even without a defined replication identity. -Replication of TRUNCATE commands is supported, but take care -when truncating groups of tables connected by foreign keys. When replicating -a truncate action, the subscriber truncates the same group of tables that -was truncated on the origin, either explicitly specified or implicitly -collected by CASCADE, except in cases where replication sets are defined. -See [Replication sets](repsets) for further details and examples. -This works correctly if all affected tables are part of the same -subscription. But if some tables to truncate on the subscriber have -foreign-key links to tables that aren't part of the same (or any) -replication set, then applying the truncate action on the -subscriber fails. - -Row-level locks taken implicitly by INSERT, UPDATE, and DELETE commands are -replicated as the changes are made. -Table-level locks taken implicitly by INSERT, UPDATE, DELETE, and TRUNCATE -commands are also replicated. -Explicit row-level locking (`SELECT ... FOR UPDATE/FOR SHARE`) by user sessions -isn't replicated, nor are advisory locks. Information stored by transactions -running in SERIALIZABLE mode isn't replicated to other nodes. The -transaction isolation level of SERIALIAZABLE is supported, but transactions -aren't serialized across nodes in the presence of concurrent -transactions on multiple nodes. - -If DML is executed on multiple nodes concurrently, then potential conflicts -might occur if executing with asynchronous replication. You must -either handle these or avoid them. Various avoidance mechanisms are possible, -discussed in [Conflicts](consistency/conflicts). - -Sequences need special handling, described in [Sequences](sequences). - -Binary data in BYTEA columns is replicated normally, allowing "blobs" of data -up to 1 GB. Use of the PostgreSQL "large object" facility isn't -supported in PGD. - -Rules execute only on the origin node so aren't executed during apply, -even if they're enabled for replicas. - -Replication is possible only from base tables to base tables. That is, the -tables on the source and target on the subscription side must be -tables, not views, materialized views, or foreign tables. Attempts to -replicate tables other than base tables result in an error. -DML changes that are made through updatable views are resolved to -base tables on the origin and then applied to the same base table name on -the target. - -PGD supports partitioned tables transparently, meaning that you can add a partitioned -table to a replication set and -changes that involve any of the partitions are replicated downstream. - -By default, triggers execute only on the origin node. For example, an INSERT -trigger executes on the origin node and is ignored when you apply the change on -the target node. You can specify for triggers to execute on both the origin -node at execution time and on the target when it's replicated ("apply time") -by using `ALTER TABLE ... ENABLE ALWAYS TRIGGER`. Or, use the `REPLICA` option -to execute only at apply time: `ALTER TABLE ... ENABLE REPLICA TRIGGER`. - -Some types of trigger aren't executed on apply, even if they exist on a -table and are currently enabled. Trigger types not executed are: - -- Statement-level triggers (`FOR EACH STATEMENT`) -- Per-column UPDATE triggers (`UPDATE OF column_name [, ...]`) - -PGD replication apply uses the system-level default search_path. Replica -triggers, stream triggers, and index expression functions can assume -other search_path settings that then fail when they execute on apply. -To prevent this from occurring, use any of these techniques: - -- Resolve object references clearly using either only the default search_path. -- Always use fully qualified references to objects, e.g., schema.objectname. -- Set the search path for a function using `ALTER FUNCTION ... SET search_path = ...` for the functions affected. - -PGD assumes that there are no issues related to text or other -collatable datatypes, i.e., all collations in use are available on all -nodes, and the default collation is the same on all nodes. Replicating -changes uses equality searches to locate Replica Identity values, so this -does't have any effect except where unique indexes are explicitly defined -with nonmatching collation qualifiers. Row filters might be affected -by differences in collations if collatable expressions were used. - -PGD handling of very long "toasted" data in PostgreSQL is transparent to -the user. The TOAST "chunkid" values likely differ between -the same row on different nodes, but that doesn't cause any problems. - -PGD can't work correctly if Replica Identity columns are marked as external. - -PostgreSQL allows CHECK() constraints that contain volatile functions. Since -PGD re-executes CHECK() constraints on apply, any subsequent re-execution that -doesn't return the same result as before causes data divergence. - -PGD doesn't restrict the use of foreign keys. Cascading FKs are allowed. - -## Nonreplicated statements - -None of the following user commands are replicated by PGD, so their effects -occur on the local/origin node only: - -- Cursor operations (DECLARE, CLOSE, FETCH) -- Execution commands (DO, CALL, PREPARE, EXECUTE, EXPLAIN) -- Session management (DEALLOCATE, DISCARD, LOAD) -- Parameter commands (SET, SHOW) -- Constraint manipulation (SET CONSTRAINTS) -- Locking commands (LOCK) -- Table maintenance commands (VACUUM, ANALYZE, CLUSTER, REINDEX) -- Async operations (NOTIFY, LISTEN, UNLISTEN) - -Since the `NOTIFY` SQL command and the `pg_notify()` functions -aren't replicated, notifications aren't reliable in case of failover. -This means that notifications can easily be lost at failover if a -transaction is committed just when the server crashes. -Applications running `LISTEN` might miss notifications in case of failover. - -This is true in standard PostgreSQL replication, and PGD doesn't -yet improve on this. CAMO and Eager Replication options don't -allow the `NOTIFY` SQL command or the `pg_notify()` function. - -## DML and DDL replication - -PGD doesn't replicate the DML statement. It replicates the changes -caused by the DML statement. For example, an UPDATE that changed -two rows replicates two changes, whereas a DELETE that didn't -remove any rows doesn't replicate anything. This means that the results -of executing volatile statements are replicated, ensuring there's no -divergence between nodes as might occur with statement-based replication. - -DDL replication works differently to DML. For DDL, PGD replicates the -statement, which then executes on all nodes. So a `DROP TABLE IF EXISTS` -might not replicate anything on the local node, but the statement is -still sent to other nodes for execution if DDL replication is enabled. -Full details are covered in [DDL replication](ddl). - -PGD works to ensure that intermixed DML and DDL -statements work correctly, even in the same transaction. - -## Replicating between different release levels - -PGD is designed to replicate between nodes that have different major -versions of PostgreSQL. This feature is designed to allow major -version upgrades without downtime. - -PGD is also designed to replicate between nodes that have different -versions of PGD software. This feature is designed to allow version -upgrades and maintenance without downtime. - -However, while it's possible to join a node with a major version in -a cluster, you can't add a node with a minor version if the cluster -uses a newer protocol version. Doing so returns an error. - -Both of these features might be affected by specific restrictions. -See [Release notes](/pgd/latest/rel_notes/) for any known incompatibilities. - -## Replicating between nodes with differences - -By default, DDL is automatically sent to all nodes. You can control this manually, as described in [DDL replication](ddl), and you can use it to create differences between database schemas across nodes. -PGD is designed to allow replication to continue even with minor -differences between nodes. These features are designed to allow -application schema migration without downtime or to allow logical -standby nodes for reporting or testing. - -Currently, replication requires the same table name on all nodes. A future -feature might allow a mapping between different table names. - -It's possible to replicate between tables with dissimilar partitioning -definitions, such as a source that's a normal table replicating to a -partitioned table, including support for updates that change partitions -on the target. It can be faster if the partitioning definition is the -same on the source and target since dynamic partition routing doesn't need to execute at apply time. -For details, see [Replication sets](repsets). - -By default, all columns are replicated. - -PGD replicates data columns based on the column name. If a column -has the same name but a different datatype, PGD attempt to cast from the source -type to the target type, if casts were defined that allow that. - -PGD supports replicating between tables that have a different number of columns. - -If the target has missing columns from the source, then PGD raises -a `target_column_missing` conflict, for which the default conflict resolver -is `ignore_if_null`. This throws an error if a non-NULL value arrives. -Alternatively, you can also configure a node with a conflict resolver of `ignore`. -This setting doesn't throw an error but silently ignores any additional -columns. - -If the target has additional columns not seen in the source record, then PGD -raises a `source_column_missing` conflict, for which the default conflict resolver -is `use_default_value`. Replication proceeds if the additional columns -have a default, either NULL (if nullable) or a default expression. It -throws an error and halts replication if not. - -Transform triggers can also be used on tables to provide default values -or alter the incoming data in various ways before apply. - -If the source and the target have different constraints, then -replication is attempted, but it might fail if the rows from -source can't be applied to the target. Row filters can help here. - -Replicating data from one schema to a more relaxed schema won't cause failures. -Replicating data from a schema to a more restrictive schema can be a source of -potential failures. -The right way to solve this is to place a constraint on the more relaxed side, -so bad data can't be entered. That way, no bad data ever arrives -by replication, so it never fails the transform into the more restrictive -schema. For example, if one schema has a column of type TEXT and another schema -defines the same column as XML, add a CHECK constraint onto the TEXT column -to enforce that the text is XML. - -You can define a table with different indexes on each node. By default, the -index definitions are replicated. See [DDL replication](ddl) to -specify how to create an index on only a subset of nodes or just locally. - -Storage parameters, such as `fillfactor` and `toast_tuple_target`, can differ -between nodes for a table without problems. An exception to that is that the -value of a table's storage parameter `user_catalog_table` must be identical -on all nodes. - -A table being replicated must be owned by the same user/role on each node. -See [Security and roles](security) for further discussion. - -Roles can have different passwords for connection on each node, although -by default changes to roles are replicated to each node. See -[DDL replication](ddl) to specify how to alter a role password on only a -subset of nodes or locally. - -## Comparison between nodes with differences - -LiveCompare is a tool for data comparison on a database, against PGD and -non-PGD nodes. It needs a minimum of two connections to compare against -and reach a final result. - -Since LiveCompare 1.3, you can configure with `all_bdr_nodes` set. This setting -saves you from clarifying all the relevant DSNs for each separate node in the -cluster. An EDB Postgres Distributed cluster has N amount of nodes with connection information, but -it's only the initial and output connection that LiveCompare 1.3+ needs -to complete its job. Setting `logical_replication_mode` states how all the -nodes are communicating. - -All the configuration is done in a `.ini` file named `bdrLC.ini`, for example. -Find templates for this configuration file in -`/etc/2ndq-livecompare/`. - -While LiveCompare executes, you see N+1 progress bars, N being -the number of processes. Once all the tables are sourced, a time displays -as the transactions per second (tps) was measured. This continues to -count the time, giving you an estimate and then a total execution time at the end. - -This tool offers a lot of customization and filters, such as tables, schemas, and -replication_sets. LiveCompare can use stop-start without losing context -information, so it can run at convenient times. After the comparison, a -summary and a DML script are generated so you can review it. Apply -the DML to fix any differences found. - -## General rules for applications - -PGD uses replica identity values to identify the rows to -change. -Applications can cause difficulties if they insert, delete, and then later -reuse the same unique identifiers. -This is known as the [ABA problem](https://en.wikipedia.org/wiki/ABA_problem). PGD can't know whether the rows are the -current row, the last row, or much older rows. - -Similarly, since PGD uses table names to identify the table against which -changes are replayed, a similar ABA problem exists with applications that -create, drop, and then later reuse the same object names. - -These issues give rise to some simple rules for applications to follow: - -- Use unique identifiers for rows (INSERT). -- Avoid modifying unique identifiers (UPDATE). -- Avoid reusing deleted unique identifiers. -- Avoid reusing dropped object names. - -In the general case, breaking those rules can lead to data anomalies and -divergence. Applications can break those rules as long as certain conditions -are met, but use caution: while anomalies are unlikely, they aren't -impossible. For example, you can reuse a row value as long as the DELETE was replayed on all nodes, including down nodes. This might normally occur in -less than a second but can take days if a severe issue occurred -on one node that prevented it from restarting correctly. - -## Timing considerations and synchronous replication - -Being asynchronous by default, peer nodes might lag behind, making it -possible for a client connected to multiple PGD nodes or switching -between them to read stale data. - -A [queue wait -function](/pgd/latest/reference/functions/#bdrwait_for_apply_queue) is provided -for clients or proxies to prevent such stale reads. - -The synchronous replication features of Postgres are available to PGD -as well. In addition, PGD provides multiple variants for more synchronous -replication. See -[Durability and performance options](durability) for an overview and comparison of all variants available and -its different modes. - -## Use of table access methods (TAMs) in PGD - -PGD 5.0 supports two table access methods released with EDB Postgres 15.0. -These two table access methods have been certified and allowed in PGD 5.0: - - * Auto cluster - * Ref data - -Any other TAM is restricted until certified by EDB. -If you are planning to use any of the table access method on a table, -you need to configure that TAM on each participating node in the -PGD cluster. -To configure auto cluster or ref data TAM, follow these steps on each node: -1. Update `postgresql.conf` to specify TAMs `autocluster` or `refdata` for the - `shared_preload_libraries` parameter. -1. Restart the server and execute `CREATE EXTENSION autocluster;` or - `CREATE EXTENSION refdata;`. - -After you create the extension, you can use TAM to create a table using -`CREATE TABLE test USING autocluster;` or `CREATE TABLE test USING refdata;`. -This replicates to all the PGD nodes. -For more information on these table access methods, see [`CREATE TABLE`](/epas/latest/epas_compat_sql/36_create_table/). - diff --git a/product_docs/docs/pgd/5/appusage/behavior.mdx b/product_docs/docs/pgd/5/appusage/behavior.mdx new file mode 100644 index 00000000000..fbefa0d093f --- /dev/null +++ b/product_docs/docs/pgd/5/appusage/behavior.mdx @@ -0,0 +1,140 @@ +--- +title: Application behavior +navTitle: Application behavior +--- + +Much of PGD's replication behavior is transparent to applications. Understanding how it +achieves that and the elements that aren't transparent is important to successfully developing +an application that works well with PGD. + +### Replication behavior + +PGD supports replicating changes made on one node to other nodes. + +PGD, by default, replicates all changes from INSERT, UPDATE, DELETE, and TRUNCATE +operations from the source node to other nodes. Only the final changes are sent, +after all triggers and rules are processed. For example, `INSERT ... ON CONFLICT +UPDATE` sends either an insert or an update, depending on what occurred on the +origin. If an update or delete affects zero rows, then no changes are sent. + +You can replicate INSERT without any preconditions. + +For updates and deletes to replicate on other nodes, PGD must be able to +identify the unique rows affected. PGD requires that a table have either a +PRIMARY KEY defined, a UNIQUE constraint, or an explicit REPLICA IDENTITY +defined on specific columns. If one of those isn't defined, a warning is +generated, and later updates or deletes are explicitly blocked. If REPLICA +IDENTITY FULL is defined for a table, then a unique index isn't required. In +that case, updates and deletes are allowed and use the first non-unique index +that's live, valid, not deferred, and doesn't have expressions or WHERE clauses. +Otherwise, a sequential scan is used. + +### Truncate + +You can use TRUNCATE even without a defined replication identity. Replication of +TRUNCATE commands is supported, but take care when truncating groups of tables +connected by foreign keys. When replicating a truncate action, the subscriber +truncates the same group of tables that was truncated on the origin, either +explicitly specified or implicitly collected by CASCADE, except in cases where +replication sets are defined. See [Replication sets](../repsets) for +details and examples. This works correctly if all affected tables are part of +the same subscription. But if some tables to truncate on the subscriber have +foreign-key links to tables that aren't part of the same (or any) replication +set, then applying the truncate action on the subscriber fails. + +### Row-level locks + +Row-level locks taken implicitly by INSERT, UPDATE, and DELETE commands are +replicated as the changes are made. Table-level locks taken implicitly by +INSERT, UPDATE, DELETE, and TRUNCATE commands are also replicated. Explicit +row-level locking (`SELECT ... FOR UPDATE/FOR SHARE`) by user sessions isn't +replicated, nor are advisory locks. Information stored by transactions running +in SERIALIZABLE mode isn't replicated to other nodes. The transaction isolation +level of SERIALIAZABLE is supported, but transactions aren't serialized across +nodes in the presence of concurrent transactions on multiple nodes. + +If DML is executed on multiple nodes concurrently, then potential conflicts +might occur if executing with asynchronous replication. You must either handle +these or avoid them. Various avoidance mechanisms are possible, discussed in +[Conflicts](../consistency/conflicts). + +### Sequences + +Sequences need special handling, described in [Sequences](../sequences). This is +because in a cluster, sequences must be global to avoid nodes creating +conflicting values. Global sequences are available with global locking to ensure +integrity. + +### Binary objects + +Binary data in BYTEA columns is replicated normally, allowing "blobs" of data up +to 1 GB. Use of the PostgreSQL "large object" facility isn't supported in PGD. + +### Rules + +Rules execute only on the origin node so aren't executed during apply, +even if they're enabled for replicas. + +### Base tables only + +Replication is possible only from base tables to base tables. That is, the +tables on the source and target on the subscription side must be tables, not +views, materialized views, or foreign tables. Attempts to replicate tables other +than base tables result in an error. DML changes that are made through updatable +views are resolved to base tables on the origin and then applied to the same +base table name on the target. + +### Partitioned tables + +PGD supports partitioned tables transparently, meaning that you can add a +partitioned table to a replication set and changes that involve any of the +partitions are replicated downstream. + +### Triggers + +By default, triggers execute only on the origin node. For example, an INSERT +trigger executes on the origin node and is ignored when you apply the change on +the target node. You can specify for triggers to execute on both the origin node +at execution time and on the target when it's replicated ("apply time") by using +`ALTER TABLE ... ENABLE ALWAYS TRIGGER`. Or, use the `REPLICA` option to execute +only at apply time: `ALTER TABLE ... ENABLE REPLICA TRIGGER`. + +Some types of trigger aren't executed on apply, even if they exist on a +table and are currently enabled. Trigger types not executed are: + +- Statement-level triggers (`FOR EACH STATEMENT`) +- Per-column UPDATE triggers (`UPDATE OF column_name [, ...]`) + +PGD replication apply uses the system-level default search_path. Replica +triggers, stream triggers, and index expression functions can assume other +search_path settings that then fail when they execute on apply. To prevent this +from occurring, use any of these techniques: + +- Resolve object references clearly using only the default search_path. +- Always use fully qualified references to objects, for example, `schema.objectname`. +- Set the search path for a function using `ALTER FUNCTION ... SET search_path + = ...` for the functions affected. + +PGD assumes that there are no issues related to text or other collatable +datatypes, that is, all collations in use are available on all nodes, and the +default collation is the same on all nodes. Replicating changes uses equality +searches to locate Replica Identity values, so this does't have any effect +except where unique indexes are explicitly defined with nonmatching collation +qualifiers. Row filters might be affected by differences in collations if +collatable expressions were used. + +### Toast + +PGD handling of very long "toasted" data in PostgreSQL is transparent to the +user. The TOAST "chunkid" values likely differ between the same row on different +nodes, but that doesn't cause any problems. + +### Other restrictions + +PGD can't work correctly if Replica Identity columns are marked as external. + +PostgreSQL allows CHECK() constraints that contain volatile functions. Since PGD +reexecutes CHECK() constraints on apply, any subsequent reexecution that +doesn't return the same result as before causes data divergence. + +PGD doesn't restrict the use of foreign keys. Cascading FKs are allowed. diff --git a/product_docs/docs/pgd/5/appusage/dml-ddl.mdx b/product_docs/docs/pgd/5/appusage/dml-ddl.mdx new file mode 100644 index 00000000000..b754d80577c --- /dev/null +++ b/product_docs/docs/pgd/5/appusage/dml-ddl.mdx @@ -0,0 +1,59 @@ +--- +title: DML and DDL replication and nonreplication. +navTitle: DML and DDL replication +--- + +The two major classes of SQL statement are DML and DDL. + +* DML is the data modification language and is concerned with the SQL statements that modify the data stored in tables. It includes UPDATE, DELETE, and INSERT. + +* DDL is the data definition language and is concerned with the SQL statements that modify how the data is stored. It includes CREATE, ALTER, and DROP. + +PGD handles each class differently. + +## DML replication + +PGD doesn't replicate the DML statement. It replicates the changes caused by the +DML statement. For example, an UPDATE that changed two rows replicates two +changes, whereas a DELETE that didn't remove any rows doesn't replicate +anything. This means that the results of executing volatile statements are +replicated, ensuring there's no divergence between nodes as might occur with +statement-based replication. + +## DDL replication + +DDL replication works differently from DML. For DDL, PGD replicates the statement, +which then executes on all nodes. So a `DROP TABLE IF EXISTS` might not +replicate anything on the local node, but the statement is still sent to other +nodes for execution if DDL replication is enabled. For details, see +[DDL replication](ddl). + +PGD works to ensure that intermixed DML and DDL statements work correctly, even +in the same transaction. + +## Nonreplicated statements + +Outside of those two classes are SQL commands that PGD, by design, doesn't +replicate. None of the following user commands are replicated by PGD, so their +effects occur on the local/origin node only: + +- Cursor operations (DECLARE, CLOSE, FETCH) +- Execution commands (DO, CALL, PREPARE, EXECUTE, EXPLAIN) +- Session management (DEALLOCATE, DISCARD, LOAD) +- Parameter commands (SET, SHOW) +- Constraint manipulation (SET CONSTRAINTS) +- Locking commands (LOCK) +- Table maintenance commands (VACUUM, ANALYZE, CLUSTER, REINDEX) +- Async operations (NOTIFY, LISTEN, UNLISTEN) + +Since the `NOTIFY` SQL command and the `pg_notify()` functions aren't +replicated, notifications aren't reliable in case of failover. This means that +notifications can easily be lost at failover if a transaction is committed just +when the server crashes. Applications running `LISTEN` might miss notifications +in case of failover. + +This is true in standard PostgreSQL replication, and PGD doesn't yet improve on +this. + +CAMO and Eager Replication options don't allow the `NOTIFY` SQL command or the +`pg_notify()` function. \ No newline at end of file diff --git a/product_docs/docs/pgd/5/appusage/index.mdx b/product_docs/docs/pgd/5/appusage/index.mdx new file mode 100644 index 00000000000..ca9f8971ac6 --- /dev/null +++ b/product_docs/docs/pgd/5/appusage/index.mdx @@ -0,0 +1,32 @@ +--- +title: Application use +redirects: + - ../bdr/appusage +navigation: +- behavior +- dml-ddl +- nodes-with-differences +- rules +- timing +- table-access-methods +--- + +Developing an application with PGD is mostly the same as working with any PostgreSQL database. What's different, though, is that you need to be aware of how your application interacts with replication. You need to know how PGD behaves with applications, the SQL that is and isn't replicated, how different nodes are handled, and other important information. + +* [Application behavior](behavior) looks at how PGD replication appears to an application, such as: + - The commands that are replicated + - The commands that run locally + - When row-level locks are acquired + - How and where triggers fire + - Large objects + - Toast + +* [DML and DDL replication](dml-and-ddl) shows the differences between the two classes of SQL statements and how PGD handles replicating them. It also looks at the commands PGD doesn't replicate at all. + +* [Nodes with differences](differences) examines how PGD works with configurations where there are differing table structures and schemas on replicated nodes. Also covered is how to compare between such nodes with LiveCompare and how differences in PostgreSQL versions running on nodes can be handled. + +* [Application rules](rules) offers some general rules for applications to avoid data anomalies. + +* [Timing considerations](timing) shows how the asynchronous/synchronous replication might affect an application's view of data and notes functions to mitigate stale reads. + +* [Table access methods](table-access-methods) (TAMs) notes the TAMs available with PGD and how to enable them. diff --git a/product_docs/docs/pgd/5/appusage/nodes-with-differences.mdx b/product_docs/docs/pgd/5/appusage/nodes-with-differences.mdx new file mode 100644 index 00000000000..2cbfe9f6a12 --- /dev/null +++ b/product_docs/docs/pgd/5/appusage/nodes-with-differences.mdx @@ -0,0 +1,122 @@ +--- +title: Nodes with differences +navTitle: Nodes with differences +--- + +## Replicating between nodes with differences + +By default, DDL is sent to all nodes. You can control this behavior, +as described in [DDL replication](../ddl), and you can use it to create +differences between database schemas across nodes. PGD is designed to allow +replication to continue even with minor differences between nodes. These +features are designed to allow application schema migration without downtime or +to allow logical standby nodes for reporting or testing. + +Currently, replication requires the same table name on all nodes. A future +feature might allow a mapping between different table names. + +It's possible to replicate between tables with dissimilar partitioning +definitions, such as a source that's a normal table replicating to a partitioned +table, including support for updates that change partitions on the target. It +can be faster if the partitioning definition is the same on the source and +target since dynamic partition routing doesn't need to execute at apply time. +For details, see [Replication sets](../repsets). + +By default, all columns are replicated. + +PGD replicates data columns based on the column name. If a column has the same +name but a different data type, PGD attempts to cast from the source type to the +target type, if casts were defined that allow that. + +PGD supports replicating between tables that have a different number of columns. + +If the target has missing columns from the source, then PGD raises a +`target_column_missing` conflict, for which the default conflict resolver is +`ignore_if_null`. This throws an error if a non-NULL value arrives. +Alternatively, you can also configure a node with a conflict resolver of +`ignore`. This setting doesn't throw an error but silently ignores any +additional columns. + +If the target has additional columns not seen in the source record, then PGD +raises a `source_column_missing` conflict, for which the default conflict +resolver is `use_default_value`. Replication proceeds if the additional columns +have a default, either NULL (if nullable) or a default expression. If not, it throws an +error and halts replication. + +Transform triggers can also be used on tables to provide default values or alter +the incoming data in various ways before apply. + +If the source and the target have different constraints, then replication is +attempted, but it might fail if the rows from source can't be applied to the +target. Row filters can help here. + +Replicating data from one schema to a more relaxed schema doesn't cause failures. +Replicating data from a schema to a more restrictive schema can be a source of +potential failures. The right way to solve this is to place a constraint on the +more relaxed side, so bad data can't be entered. That way, no bad data ever +arrives by replication, so it never fails the transform into the more +restrictive schema. For example, if one schema has a column of type TEXT and +another schema defines the same column as XML, add a CHECK constraint onto the +TEXT column to enforce that the text is XML. + +You can define a table with different indexes on each node. By default, the +index definitions are replicated. To specify how +to create an index on only a subset of nodes or just locally, see [DDL replication](../ddl) . + +Storage parameters, such as `fillfactor` and `toast_tuple_target`, can differ +between nodes for a table without problems. An exception to that behavior is that the +value of a table's storage parameter `user_catalog_table` must be identical on +all nodes. + +A table being replicated must be owned by the same user/role on each node. See +[Security and roles](../security) for details. + +Roles can have different passwords for connection on each node, although by +default changes to roles are replicated to each node. See [DDL +replication](../ddl) to specify how to alter a role password on only a subset of +nodes or locally. + +## Comparison between nodes with differences + +LiveCompare is a tool for data comparison on a database against PGD and non-PGD +nodes. It needs a minimum of two connections to compare against and reach a +final result. + +Starting with LiveCompare 1.3, you can configure with `all_bdr_nodes` set. This setting +saves you from clarifying all the relevant DSNs for each separate node in the +cluster. An EDB Postgres Distributed cluster has N amount of nodes with +connection information, but it's only the initial and output connection that +LiveCompare 1.3 and later needs to complete its job. Setting `logical_replication_mode` +states how all the nodes are communicating. + +All the configuration is done in a `.ini` file named `bdrLC.ini`, for example. +Find templates for this configuration file in `/etc/2ndq-livecompare/`. + +While LiveCompare executes, you see N+1 progress bars, N being the number of +processes. Once all the tables are sourced, a time displays as the transactions +per second (tps) was measured. This mechanism continues to count the time, giving you an +estimate and then a total execution time at the end. + +This tool offers a lot of customization and filters, such as tables, schemas, +and replication_sets. LiveCompare can use stop-start without losing context +information, so it can run at convenient times. After the comparison, a summary +and a DML script are generated so you can review it. Apply the DML to fix any +differences found. + +## Replicating between different release levels + +The other difference between nodes that you might encounter is where there are +different major versions of PostgreSQL on the nodes. PGD is designed to +replicate between different major release versions. This feature is designed to +allow major version upgrades without downtime. + +PGD is also designed to replicate between nodes that have different versions of +PGD software. This feature is designed to allow version upgrades and maintenance +without downtime. + +However, while it's possible to join a node with a major version in a cluster, +you can't add a node with a minor version if the cluster uses a newer protocol +version. Doing so returns an error. + +Both of these features might be affected by specific restrictions. See [Release +notes](../rel_notes/) for any known incompatibilities. \ No newline at end of file diff --git a/product_docs/docs/pgd/5/appusage/rules.mdx b/product_docs/docs/pgd/5/appusage/rules.mdx new file mode 100644 index 00000000000..52e3cc7ca50 --- /dev/null +++ b/product_docs/docs/pgd/5/appusage/rules.mdx @@ -0,0 +1,33 @@ +--- +title: General rules for applications +navTitle: Application rules +--- + +## Background + +PGD uses replica identity values to identify the rows to change. Applications +can cause difficulties if they insert, delete, and then later reuse the same +unique identifiers. This is known as the [ABA +problem](https://en.wikipedia.org/wiki/ABA_problem). PGD can't know whether the +rows are the current row, the last row, or much older rows. + +Similarly, since PGD uses table names to identify the table against which +changes are replayed, a similar ABA problem exists with applications that +create, drop, and then later reuse the same object names. + +## Rules for applications + +These issues give rise to some simple rules for applications to follow: + +- Use unique identifiers for rows (INSERT). +- Avoid modifying unique identifiers (UPDATE). +- Avoid reusing deleted unique identifiers. +- Avoid reusing dropped object names. + +In the general case, breaking those rules can lead to data anomalies and +divergence. Applications can break those rules as long as certain conditions are +met. However, use caution: while anomalies are unlikely, they aren't impossible. For +example, you can reuse a row value as long as the DELETE was replayed on all +nodes, including down nodes. This might normally occur in less than a second but +can take days if a severe issue occurred on one node that prevented it from +restarting correctly. diff --git a/product_docs/docs/pgd/5/appusage/table-access-methods.mdx b/product_docs/docs/pgd/5/appusage/table-access-methods.mdx new file mode 100644 index 00000000000..d7139b06879 --- /dev/null +++ b/product_docs/docs/pgd/5/appusage/table-access-methods.mdx @@ -0,0 +1,25 @@ +--- +title: Use of table access methods (TAMs) in PGD +navTitle: Table access methods +--- + +PGD 5.0 supports two table access methods (TAMs) released with EDB Postgres 15.0. These +two TAMs were certified and are allowed in PGD 5.0: + + * Auto cluster + * Ref data + +Any other TAM is restricted until certified by EDB. If you're planning to use +any of the TAMs on a table, you need to configure that TAM on +each participating node in the PGD cluster. To configure auto cluster or ref +data TAM, on each node: + +1. Update `postgresql.conf` to specify TAMs `autocluster` or `refdata` for the + `shared_preload_libraries` parameter. +1. Restart the server and execute `CREATE EXTENSION autocluster;` or + `CREATE EXTENSION refdata;`. + +After you create the extension, you can use TAM to create a table using `CREATE +TABLE test USING autocluster;` or `CREATE TABLE test USING refdata;`. These commands +replicate to all the PGD nodes. For more information on these table access +methods, see [`CREATE TABLE`](/epas/latest/epas_compat_sql/36_create_table/). diff --git a/product_docs/docs/pgd/5/appusage/timing.mdx b/product_docs/docs/pgd/5/appusage/timing.mdx new file mode 100644 index 00000000000..fd8ef8f2e01 --- /dev/null +++ b/product_docs/docs/pgd/5/appusage/timing.mdx @@ -0,0 +1,17 @@ +--- +title: Timing considerations and synchronous replication +navTitle: Timing considerations +--- + +Being asynchronous by default, peer nodes might lag behind. This behavior makes it +possible for a client connected to multiple PGD nodes or switching +between them to read stale data. + +A [queue wait +function](/pgd/latest/reference/functions/#bdrwait_for_apply_queue) is provided +for clients or proxies to prevent such stale reads. + +The synchronous replication features of Postgres are available to PGD as well. +In addition, PGD provides multiple variants for more synchronous replication. +See [Durability and performance options](durability) for an overview and +comparison of all variants available and their different modes.