Skip to content

Commit

Permalink
Merge pull request #3391 from EnterpriseDB/release/2022-11-29
Browse files Browse the repository at this point in the history
Release: 2022-11-29
  • Loading branch information
drothery-edb authored Nov 29, 2022
2 parents 1acb1dd + 559794e commit bf30eb2
Show file tree
Hide file tree
Showing 21 changed files with 1,223 additions and 3 deletions.
22 changes: 22 additions & 0 deletions advocacy_docs/pg_extensions/advanced_storage_pack/configuring.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
title: Configuring Advanced Storage Pack
navTitle: Configuring
---

Place the extension module implementing the custom TAM in `shared_preload_libraries` so that it loads early during the Postgres startup. This step is necessary to ensure that the extension is available before the first access to a table based on the given TAM. For example, update the parameter in `postgresql.conf` with `autocluster` or `refadata`:

```ini
shared_preload_libraries = '$libdir/<extension_name>'
```

After restarting the server, execute the SQL command to create the extension. This command creates the extension only in the connected database where the SQL is executed, and must be rerun in each database where the extension used:

```sql
CREATE EXTENSION <extension_name>;
```

Within databases where the extension has been created, tables can be created to use the TAM which the extension provides:

```sql
CREATE TABLE mytable USING <extension_name>;
```
37 changes: 37 additions & 0 deletions advocacy_docs/pg_extensions/advanced_storage_pack/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
title: EDB Advanced Storage Pack
navigation:
- rel_notes
- installing
- configuring
---

EDB Advanced Storage Pack provides advanced storage options for PostgreSQL databases in the form of Table Access Method (TAM) extensions. These storage options can enhance the performance and reliability of databases without requiring application changes.

For tables whose access patterns are known in advance, a targeted TAM that makes different trade-offs may be preferable. For instance, if a given table in an application is INSERT-only and the rows never receive any updates, using a specialized TAM for this table that has INSERT-specific optimizations could be considered.

EnterpriseDB offers two TAMs in the Advanced Storage pack:

## Autocluster

The Autocluster TAM provides faster access to clustered data by keeping track of the last inserted row for any value in a side-table. New rows can then be added to the same data blocks as previous rows, keeping the data clustered, which reduces access time to related data. This feature is achieved by maintaining rows with the same key values clustered together so that an index scan for a specific key can find all the rows close together and doesn't need to retrieve as many table pages to satisfy the query.

## Refdata

The Refdata TAM is optimized for mostly-static data, which contains occasional INSERTs and very few DELETEs and UPDATEs. For database schemas that utilize foreign keys to reference data, this TAM can provide performance gains of 5-10% and increased scalability. This feature is achieved by taking an exclusive lock on the reference table whenever it is modified, blocking out concurrent modifications by any other session as well as modifications to tables which reference the table. For example:

```sql
CREATE TABLE department (
department_id SERIAL PRIMARY KEY,
department_name TEXT
) USING refdata;

CREATE TABLE employee (
...
department_id NOT NULL REFERENCES department(department_id)
);
```

The `employee` table is just a standard heap table; only the `department` table uses the `refdata` TAM. Inserts and updates of the employee table don't take out row level locks on the department table, thereby saving query time, avoiding the need to update the rows in the department table, and avoiding the need to write out the referred-to department table rows to disk and to the write ahead log.

If updates to the `department` table are frequent, using the Refdata TAM isn't advisable, because concurrent modifications to it and to the employee table are then blocked. If only infrequent changes are made to the `department` table, speeding up frequent changes to the employee table, and reducing write ahead log traffic may well be worth this cost.
79 changes: 79 additions & 0 deletions advocacy_docs/pg_extensions/advanced_storage_pack/installing.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
title: Installing Advanced Storage Pack
navTitle: Installing
---

The Advanced Storage Pack is supported on the same platforms as the Postgres distribution you are using. Support for Advanced Storage Pack starts with Postgres 11. For details, see:
- [EDB Postgres Advanced Server Product Compatibility](https://www.enterprisedb.com/platform-compatibility#epas)

- [PostgreSQL Product Compatibility](https://www.enterprisedb.com/resources/platform-compatibility#pg)

- [EDB Postgres Distributed (includes EDB Postgres Extended)](https://www.enterprisedb.com/resources/platform-compatibility#bdr)

## Prerequisites

Before you begin the installation process:

- Install Postgres
- [Installing EDB Postgres Advanced Server](/epas/latest/epas_inst_linux/installing_epas_using_edb_repository/)

- [Installing PostgreSQL](https://www.postgresql.org/download/)

- [Installing EDB Postgres Distributed (includes EDB Postgres Extended)](https://www.enterprisedb.com/docs/pgd/latest/deployments/tpaexec/)

- Set up the repository

Setting up the repository is a one-time task. If you have already set up your repository, you do not need to perform this step.

To set up the repository, go to [EDB repositories](https://www.enterprisedb.com/repos-downloads) and follow the instructions provided there.


## Install the package

The syntax for the RPM package install command is:

```shell
sudo <package-manager> -y install edb-<postgres><postgres_version>-advanced-storage-pack<major_version>-<full_version>
```

And the syntax for the Debian package install command is:

```shell
sudo <package-manager> -y install edb-<postgres><postgres_version>-advanced-storage-pack-<major_version>-<full_version>
```

where:
- `<package-manager>`is the package manager used with your operating system:

| Package manager | Operating system |
| --------------- | -------------------------------- |
| dnf | RHEL 8 and derivatives |
| yum | RHEL 7 and derivatives, CentOS 7 |
| zypper | SLES |
| apt-get | Debian and derivatives |

- `<postgres>` is the distribution of Postgres you are using:

| Postgres distribution | Value |
| ---------------------------- | ---------- |
| PostgreSQL | pg |
| EDB Postgres Advanced Server | as |
| EDB Postgres Extended | pgextended |

- `<postgres_version>` is the version of Postgres you are using.

- `<major_version>` is the major version of the extension you are installing.

- `<full_version>` is the full version of the extension you are installing.

For example, to install Advanced Storage Pack 1.0.0 for EDB Postgres Advanced Server 14 on a RHEL 8 platform:

```shell
sudo dnf -y install edb-as14-advanced-storage-pack1-1.0.0
```

And to install Advanced Storage Pack 1.0.0 for EDB Postgres Advanced Server 14 on a Debian 11 platform:

```shell
sudo apt-get -y install edb-pg15-advanced-storage-pack-1-1.0.0
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: Release notes for Advanced Storage Pack version 1.0.0
navTitle: "Version 1.0.0"
---

This release of Advanced Storage Pack includes:

| Type | Description |
| ------- | -------------------------------------------------------------------------- |
| Feature | This is the initial release and includes the Refdata and Autocluster TAMs. |
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: Advanced Storage Pack release notes
navTitle: "Release notes"
indexCards: none
---
The Advanced Storage Pack documentation describes the latest version of Advanced Storage Pack,
including minor releases and patches. These release notes
cover what was new in each release. For new functionality introduced
in a minor or patch release, there are also indicators in the content
about the release that introduced the feature.

| Version | Release Date |
| --------------------------- | ------------ |
| [1.0.0](asp_1.0.0_rel_notes) | 2022 Nov 30 |




201 changes: 201 additions & 0 deletions advocacy_docs/pg_extensions/advanced_storage_pack/using.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
---
title: Using EDB Advanced Storage Pack
navTitle: Using
---

The following are scenarios where the EDB Advances Storage Pack TAMs would be useful.

## Refdata example

A scenario where Refdata would be useful is when creating a reference table of all the New York Stock Exchange (NYSE) stock symbols and their corporate names. This data is expected to change very rarely and be referenced frequently from a table tracking all stock trades for the entire market (like in the [Advanced Autocluster example](#advanced-autocluster-example)), so Refdata can be used instead of heap to increase performance.

```sql
CREATE SEQUENCE nyse_symbol_id_seq;
CREATE TABLE nyse_symbol (
nyse_symbol_id INTEGER NOT NULL PRIMARY KEY DEFAULT NEXTVAL('nyse_symbol_id_seq'),
symbol TEXT NOT NULL,
name TEXT NOT NULL
) USING refdata;
```
## Autocluster example

A scenario where Autocluster would be useful is with an Internet of Things (IoT) data, which are usually inserted with many rows that relate to each other and often use append-only data. When using heap instead of Autocluster, Postgres can't cluster together these related rows, so access to the set of rows touches many data blocks, can be very slow, and input/output heavy.

The following example is for an IoT of thermostats which report a houses's temperature and temperature settings every 60 seconds:

```sql
CREATE TABLE iot (
thermostat_id bigint,
recordtime timestamp,
measured_temperature float 4,
temperature_setting float 4,
) USING autocluster;
```

Using Autocluster, rows with the same `thermostat_id` are clustered together and are easier to access:

```sql
CREATE INDEX ON iot USING btree(thermostat_id);
SELECT autocluster.autocluster(
rel := 'iot'::regclass,
cols := '{1}',
max_objects := 10000
);
```

!!! Note
The `cols` parameter should match the number of columns specified in `USING btree()`. In this case, only `thermostat_id` is listed so the value is `{1}`.
!!!

Populate the table with the `thermostat_id` and `recordtime` data:

```sql
INSERT INTO iot (thermostat_id, recordtime) VALUES (456, 12:01);
INSERT INTO iot (thermostat_id, recordtime) VALUES (8945, 04:55);
INSERT INTO iot (thermostat_id, recordtime) VALUES (456, 15:32);
INSERT INTO iot (thermostat_id, recordtime) VALUES (6785, 01:36);
INSERT INTO iot (thermostat_id, recordtime) VALUES (456, 19:25);
INSERT INTO iot (thermostat_id, recordtime) VALUES (5678, 03:44);
```

When you select the data from the IoT table, you can see from the ctid location that the data with the same `thermostat_id` was clustered together:

```sql
SELECT ctid, thermostat_id, recordtime FROM iot;
__OUTPUT__
ctid | thermostat_id | recordtime
-------+-------+---------
(0,1) | 456 | 12:01
(2,2) | 8945| 04:55
(0,2) | 456 | 15:32
(3,2) | 6785| 01:36
(0,3) | 456 | 19:25
(2,5) | 5678| 03:44
(6 rows)
```

## Advanced Autocluster example

This is a more advanced way of using Autocluster than the previous example. It involves referencing the NYSE table from the [Refdata example](#refdata) and clustering together the rows based on the stock symbol, making it easier to find the latest number of trades.

Start with the NYSE table from the Refdata example:

```sql
CREATE SEQUENCE nyse_symbol_id_seq;
CREATE TABLE nyse_symbol (
nyse_symbol_id INTEGER NOT NULL PRIMARY KEY DEFAULT NEXTVAL('nyse_symbol_id_seq'),
symbol TEXT NOT NULL,
name TEXT NOT NULL
) USING refdata;
```

Then, create a highly updated table containing NYSE trades, referencing the mostly static stock symbols in the Refdata table. And, cluster the rows on the stock symbol to make it easier to look up the last x trades for a given stock:

```sql
CREATE TABLE nyse_trade (
nyse_symbol_id INTEGER NOT NULL REFERENCES nyse_symbol(nyse_symbol_id),
trade_time TIMESTAMP NOT NULL DEFAULT NOW(),
trade_price FLOAT8 NOT NULL CHECK(trade_price >= 0.0),
trade_volume BIGINT NOT NULL CHECK(trade_volume >= 1)
); -- USING autocluster;
CREATE INDEX ON nyse_trade USING BTREE(nyse_symbol_id);
SELECT autocluster.autocluster(
rel := 'nyse_trade'::regclass,
cols := '{1}',
max_objects := 3000
);
autocluster
-------------

(1 row)
```

Create a view to facilitate inserting by symbol name rather than id:

```sql
CREATE VIEW nyse_trade_symbol AS
SELECT ns.symbol, nt.trade_time, nt.trade_price, nt.trade_volume
FROM nyse_symbol ns
JOIN nyse_trade nt
ON ns.nyse_symbol_id = nt.nyse_symbol_id;
CREATE RULE stock_insert AS ON INSERT TO nyse_trade_symbol
DO INSTEAD INSERT INTO nyse_trade
(SELECT ns.nyse_symbol_id, NEW.trade_time, NEW.trade_price, NEW.trade_volume
FROM nyse_symbol ns
WHERE ns.symbol = NEW.symbol
);
```

For more information on creating a view, see the [PostgreSQL documentation](https://www.postgresql.org/docs/current/sql-createview.html).

Pre-populate the static data (shortened for brevity):

```sql
INSERT INTO nyse_symbol (symbol, name) VALUES
('A', 'Agilent Technologies'),
('AA', 'Alcoa Corp'),
('AAC', 'Ares Acquisition Corp Cl A'),
('AAIC', 'Arlington Asset Investment Corp'),
('AAIN', 'Arlington Asset Investment Corp 6.000%'),
('AAN', 'Aarons Holdings Company'),
('AAP', 'Advance Auto Parts Inc'),
('AAQC', 'Accelerate Acquisition Corp Cl A'),
('ZTR', 'Virtus Total Return Fund Inc'),
('ZTS', 'Zoetis Inc Cl A'),
('ZUO', 'Zuora Inc'),
('ZVIA', 'Zevia Pbc Cl A'),
('ZWS', 'Zurn Elkay Water Solutions Corp'),
('ZYME', 'Zymeworks Inc');
ANALYZE nyse_symbol;
```

Insert stock trades over a given time range on Friday, November 18 2022 (shortened for brevity):

```sql
\timing
INSERT INTO nyse_trade_symbol VALUES ('NSC', 'Fri Nov 18 09:51:32 2022', 248.100000, 98778);
Time: 32.349 ms
INSERT INTO nyse_trade_symbol VALUES ('BOE', 'Fri Nov 18 09:51:32 2022', 9.640000, 72973);
Time: 1.055 ms
INSERT INTO nyse_trade_symbol VALUES ('LOMA', 'Fri Nov 18 09:51:32 2022', 6.180000, 41632);
Time: 0.927 ms
INSERT INTO nyse_trade_symbol VALUES ('LXP', 'Fri Nov 18 09:51:32 2022', 10.670000, 85768);
Time: 0.941 ms
INSERT INTO nyse_trade_symbol VALUES ('ABBV', 'Fri Nov 18 09:51:32 2022', 155.000000, 46842);
Time: 0.916 ms
INSERT INTO nyse_trade_symbol VALUES ('AGD', 'Fri Nov 18 09:51:32 2022', 9.360000, 90684);
Time: 0.669 ms
INSERT INTO nyse_trade_symbol VALUES ('PAGS', 'Fri Nov 18 11:14:31 2022', 12.985270, 34734);
Time: 0.849 ms
INSERT INTO nyse_trade_symbol VALUES ('KTF', 'Fri Nov 18 11:14:31 2022', 8.435753, 73719);
Time: 0.679 ms
INSERT INTO nyse_trade_symbol VALUES ('AES', 'Fri Nov 18 11:14:31 2022', 28.072732, 549);
Time: 0.667 ms
INSERT INTO nyse_trade_symbol VALUES ('LIN', 'Fri Nov 18 11:14:31 2022', 334.617829, 39838);
Time: 0.665 ms
INSERT INTO nyse_trade_symbol VALUES ('DTB', 'Fri Nov 18 11:14:31 2022', 18.679245, 55863);
Time: 0.680 ms
ANALYZE nyse_trade;
Time: 73.832 ms
```

Select the ctid from the data for a given stock symbol to see in the output how it has been clustered together:

```sql
SELECT ctid, * FROM nyse_trade WHERE nyse_symbol_id = 1000 ORDER BY trade_time DESC LIMIT 10;
__OUTPUT__
ctid | nyse_symbol_id | trade_time | trade_price | trade_volume
-----------+----------------+--------------------------+-------------+--------------
(729,71) | 1000 | Fri Nov 18 11:13:51 2022 | 11.265938 | 72662
(729,22) | 1000 | Fri Nov 18 11:08:39 2022 | 11.262747 | 50897
(729,20) | 1000 | Fri Nov 18 11:08:30 2022 | 11.267203 | 37120
(729,9) | 1000 | Fri Nov 18 11:07:21 2022 | 11.269852 | 792
(729,6) | 1000 | Fri Nov 18 11:07:02 2022 | 11.268067 | 46221
(632,123) | 1000 | Fri Nov 18 11:04:46 2022 | 11.272623 | 97874
(632,118) | 1000 | Fri Nov 18 11:04:28 2022 | 11.271794 | 65579
(632,14) | 1000 | Fri Nov 18 10:55:45 2022 | 11.268543 | 8557
(632,2) | 1000 | Fri Nov 18 10:54:45 2022 | 11.26414 | 94078
(506,126) | 1000 | Fri Nov 18 10:54:01 2022 | 11.264657 | 89641
(10 rows)
```

Loading

2 comments on commit bf30eb2

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸŽ‰ Published on https://edb-docs.netlify.app as production
πŸš€ Deployed on https://63862c6d2ae1132ce117fa63--edb-docs.netlify.app

Please sign in to comment.