Skip to content

Commit

Permalink
Citus integration (patroni#2504)
Browse files Browse the repository at this point in the history
Citus cluster (coordinator and workers) will be stored in DCS as a fleet of Patroni logically grouped together:
```
/service/batman/
/service/batman/0/
/service/batman/0/initialize
/service/batman/0/leader
/service/batman/0/members/
/service/batman/0/members/m1
/service/batman/0/members/m2
/service/batman/
/service/batman/1/
/service/batman/1/initialize
/service/batman/1/leader
/service/batman/1/members/
/service/batman/1/members/m1
/service/batman/1/members/m2
...
```

Where 0 is a Citus group for coordinator and 1, 2, etc are worker groups.

Such hierarchy allows reading the entire Citus cluster with a single call to DCS (except Zookeeper).

The get_cluster() method will be reading the entire Citus cluster on the coordinator because it needs to discover workers. For the worker cluster it will be reading the subtree of its own group.

Besides that we introduce a new method  get_citus_coordinator(). It will be used only by worker clusters.

Since there is no hierarchical structures on K8s we will use the citus group suffix on all objects that Patroni creates.
E.g.
```
batman-0-leader  # the leader config map for the coordinator
batman-0-config  # the config map holding initialize, config, and history "keys"
...
batman-1-leader  # the leader config map for worker group 1
batman-1-config
...
```

Citus integration is enabled from patroni.yaml:
```yaml
citus:
  database: citus
  group: 0  # 0 is for coordinator, 1, 2, etc are for workers
```

If enabled, Patroni will create the database, citus extension in it, and INSERTs INTO `pg_dist_authinfo` information required for Citus nodes to communicate between each other, i.e. 'password', 'sslcert', 'sslkey' for superuser if they are defined in the Patroni configuration file.

When the new Citus coordinator/worker is bootstrapped, Patroni adds `synchronous_mode: on` to the `bootstrap.dcs` section.

Besides that, Patroni takes over management of some Postgres GUCs:
- `shared_preload_libraries` - Patroni ensures that the "citus" is added to the first place
- `max_prepared_transactions` - if not set or set to 0, Patroni changes the value to `max_connections*2`
- wal_level - automatically set to logical. It is used by Citus to move/split shards. Under the hood Citus is creating/removing replication slots and they are automatically added by Patroni to the `ignore_slots` configuration to avoid accidental removal.

The coordinator primary actively discovers worker primary nodes and registers/updates them in the `pg_dist_node` table using
citus_add_node() and citus_update_node() functions.

Patroni running on the coordinator provides the new REST API endpoint: `POST /citus`. It is used by workers to facilitate controlled switchovers and restarts of worker primaries.
When the worker primary needs to shut down Postgres because of restart or switchover, it calls the `POST /citus` endpoint on the coordinator and the Patroni on the coordinator starts a transaction and calls `citus_update_node(nodeid, 'host-demoted', port)` in order to pause client connections that work with the given worker.
Once the new leader is elected or postgres started back, they perform another call to the `POST/citus` endpoint, that does another `citus_update_node()` call with actual hostname and port and commits a transaction. After transaction is committed, coordinator reestablishes connections to the worker node and client connections are unblocked.
If clients don't run long transaction the operation finishes without client visible errors, but only a short latency spike.

All operations on the `pg_dist_node` are serialized by Patroni on the coordinator. It allows to have more control and ROLLBACK transaction in progress if its lifetime exceeding a certain threshold and there are other worker nodes should be updated.
  • Loading branch information
CyberDem0n authored Jan 24, 2023
1 parent 3161f31 commit 4872ac5
Show file tree
Hide file tree
Showing 54 changed files with 3,590 additions and 603 deletions.
173 changes: 173 additions & 0 deletions Dockerfile.citus
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
## This Dockerfile is meant to aid in the building and debugging patroni whilst developing on your local machine
## It has all the necessary components to play/debug with a single node appliance, running etcd
ARG PG_MAJOR=15
ARG COMPRESS=false
ARG PGHOME=/home/postgres
ARG PGDATA=$PGHOME/data
ARG LC_ALL=C.UTF-8
ARG LANG=C.UTF-8

FROM postgres:$PG_MAJOR as builder

ARG PGHOME
ARG PGDATA
ARG LC_ALL
ARG LANG

ENV ETCDVERSION=3.3.13 CONFDVERSION=0.16.0

RUN set -ex \
&& export DEBIAN_FRONTEND=noninteractive \
&& echo 'APT::Install-Recommends "0";\nAPT::Install-Suggests "0";' > /etc/apt/apt.conf.d/01norecommend \
&& apt-get update -y \
# postgres:10 is based on debian, which has the patroni package. We will install all required dependencies
&& apt-cache depends patroni | sed -n -e 's/.*Depends: \(python3-.\+\)$/\1/p' \
| grep -Ev '^python3-(sphinx|etcd|consul|kazoo|kubernetes)' \
| xargs apt-get install -y vim curl less jq locales haproxy sudo \
python3-etcd python3-kazoo python3-pip busybox \
net-tools iputils-ping --fix-missing \
&& curl https://install.citusdata.com/community/deb.sh | bash \
&& apt-get -y install postgresql-$PG_MAJOR-citus-11.1 \
&& pip3 install dumb-init \
\
# Cleanup all locales but en_US.UTF-8
&& find /usr/share/i18n/charmaps/ -type f ! -name UTF-8.gz -delete \
&& find /usr/share/i18n/locales/ -type f ! -name en_US ! -name en_GB ! -name i18n* ! -name iso14651_t1 ! -name iso14651_t1_common ! -name 'translit_*' -delete \
&& echo 'en_US.UTF-8 UTF-8' > /usr/share/i18n/SUPPORTED \
\
# Make sure we have a en_US.UTF-8 locale available
&& localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8 \
\
# haproxy dummy config
&& echo 'global\n stats socket /run/haproxy/admin.sock mode 660 level admin' > /etc/haproxy/haproxy.cfg \
\
# vim config
&& echo 'syntax on\nfiletype plugin indent on\nset mouse-=a\nautocmd FileType yaml setlocal ts=2 sts=2 sw=2 expandtab' > /etc/vim/vimrc.local \
\
# Prepare postgres/patroni/haproxy environment
&& mkdir -p $PGHOME/.config/patroni /patroni /run/haproxy \
&& ln -s ../../postgres0.yml $PGHOME/.config/patroni/patronictl.yaml \
&& ln -s /patronictl.py /usr/local/bin/patronictl \
&& sed -i "s|/var/lib/postgresql.*|$PGHOME:/bin/bash|" /etc/passwd \
&& chown -R postgres:postgres /var/log \
\
# Download etcd
&& curl -sL https://github.com/coreos/etcd/releases/download/v${ETCDVERSION}/etcd-v${ETCDVERSION}-linux-$(dpkg --print-architecture).tar.gz \
| tar xz -C /usr/local/bin --strip=1 --wildcards --no-anchored etcd etcdctl \
\
# Download confd
&& curl -sL https://github.com/kelseyhightower/confd/releases/download/v${CONFDVERSION}/confd-${CONFDVERSION}-linux-$(dpkg --print-architecture) \
> /usr/local/bin/confd && chmod +x /usr/local/bin/confd \
# Prepare client cert for HAProxy
&& cat /etc/ssl/private/ssl-cert-snakeoil.key /etc/ssl/certs/ssl-cert-snakeoil.pem > /etc/ssl/private/ssl-cert-snakeoil.crt \
\
# Clean up all useless packages and some files
&& apt-get purge -y --allow-remove-essential python3-pip gzip bzip2 util-linux e2fsprogs \
libmagic1 bsdmainutils login ncurses-bin libmagic-mgc e2fslibs bsdutils \
exim4-config gnupg-agent dirmngr libpython2.7-stdlib libpython2.7-minimal \
&& apt-get autoremove -y \
&& apt-get clean -y \
&& rm -rf /var/lib/apt/lists/* \
/root/.cache \
/var/cache/debconf/* \
/etc/rc?.d \
/etc/systemd \
/docker-entrypoint* \
/sbin/pam* \
/sbin/swap* \
/sbin/unix* \
/usr/local/bin/gosu \
/usr/sbin/[acgipr]* \
/usr/sbin/*user* \
/usr/share/doc* \
/usr/share/man \
/usr/share/info \
/usr/share/i18n/locales/translit_hangul \
/usr/share/locale/?? \
/usr/share/locale/??_?? \
/usr/share/postgresql/*/man \
/usr/share/postgresql-common/pg_wrapper \
/usr/share/vim/vim80/doc \
/usr/share/vim/vim80/lang \
/usr/share/vim/vim80/tutor \
# /var/lib/dpkg/info/* \
&& find /usr/bin -xtype l -delete \
&& find /var/log -type f -exec truncate --size 0 {} \; \
&& find /usr/lib/python3/dist-packages -name '*test*' | xargs rm -fr \
&& find /lib/$(uname -m)-linux-gnu/security -type f ! -name pam_env.so ! -name pam_permit.so ! -name pam_unix.so -delete

# perform compression if it is necessary
ARG COMPRESS
RUN if [ "$COMPRESS" = "true" ]; then \
set -ex \
# Allow certain sudo commands from postgres
&& echo 'postgres ALL=(ALL) NOPASSWD: /bin/tar xpJf /a.tar.xz -C /, /bin/rm /a.tar.xz, /bin/ln -snf dash /bin/sh' >> /etc/sudoers \
&& ln -snf busybox /bin/sh \
&& arch=$(uname -m) \
&& darch=$(uname -m | sed 's/_/-/') \
&& files="/bin/sh /usr/bin/sudo /usr/lib/sudo/sudoers.so /lib/$arch-linux-gnu/security/pam_*.so" \
&& libs="$(ldd $files | awk '{print $3;}' | grep '^/' | sort -u) /lib/ld-linux-$darch.so.* /lib/$arch-linux-gnu/ld-linux-$darch.so.* /lib/$arch-linux-gnu/libnsl.so.* /lib/$arch-linux-gnu/libnss_compat.so.* /lib/$arch-linux-gnu/libnss_files.so.*" \
&& (echo /var/run $files $libs | tr ' ' '\n' && realpath $files $libs) | sort -u | sed 's/^\///' > /exclude \
&& find /etc/alternatives -xtype l -delete \
&& save_dirs="usr lib var bin sbin etc/ssl etc/init.d etc/alternatives etc/apt" \
&& XZ_OPT=-e9v tar -X /exclude -cpJf a.tar.xz $save_dirs \
# we call "cat /exclude" to avoid including files from the $save_dirs that are also among
# the exceptions listed in the /exclude, as "uniq -u" eliminates all non-unique lines.
# By calling "cat /exclude" a second time we guarantee that there will be at least two lines
# for each exception and therefore they will be excluded from the output passed to 'rm'.
&& /bin/busybox sh -c "(find $save_dirs -not -type d && cat /exclude /exclude && echo exclude) | sort | uniq -u | xargs /bin/busybox rm" \
&& /bin/busybox --install -s \
&& /bin/busybox sh -c "find $save_dirs -type d -depth -exec rmdir -p {} \; 2> /dev/null"; \
else \
/bin/busybox --install -s; \
fi

FROM scratch
COPY --from=builder / /

LABEL maintainer="Alexander Kukushkin <[email protected]>"

ARG PG_MAJOR
ARG COMPRESS
ARG PGHOME
ARG PGDATA
ARG LC_ALL
ARG LANG

ARG PGBIN=/usr/lib/postgresql/$PG_MAJOR/bin

ENV LC_ALL=$LC_ALL LANG=$LANG EDITOR=/usr/bin/editor
ENV PGDATA=$PGDATA PATH=$PATH:$PGBIN

COPY patroni /patroni/
COPY extras/confd/conf.d/haproxy.toml /etc/confd/conf.d/
COPY extras/confd/templates/haproxy-citus.tmpl /etc/confd/templates/haproxy.tmpl
COPY patroni*.py docker/entrypoint.sh /
COPY postgres?.yml $PGHOME/

WORKDIR $PGHOME

RUN sed -i 's/env python/&3/' /patroni*.py \
# "fix" patroni configs
&& sed -i 's/^\( connect_address:\| - host\)/#&/' postgres?.yml \
&& sed -i 's/^ listen: 127.0.0.1/ listen: 0.0.0.0/' postgres?.yml \
&& sed -i "s|^\( data_dir: \).*|\1$PGDATA|" postgres?.yml \
&& sed -i "s|^#\( bin_dir: \).*|\1$PGBIN|" postgres?.yml \
&& sed -i 's/^ - encoding: UTF8/ - locale: en_US.UTF-8\n&/' postgres?.yml \
&& sed -i 's/^scope:/log:\n loggers:\n patroni.postgresql.citus: DEBUG\n#&/' postgres?.yml \
&& sed -i 's/^\(name\|etcd\| host\| authentication\| pg_hba\| parameters\):/#&/' postgres?.yml \
&& sed -i 's/^ \(replication\|superuser\|rewind\|unix_socket_directories\|\(\( \)\{0,1\}\(username\|password\)\)\):/#&/' postgres?.yml \
&& sed -i 's/^postgresql:/&\n basebackup:\n checkpoint: fast/' postgres?.yml \
&& sed -i 's|^ parameters:| pg_hba:\n - local all all trust\n - hostssl replication all all md5 clientcert=verify-ca\n - hostssl all all all md5 clientcert=verify-ca\n&\n max_connections: 100\n shared_buffers: 16MB\n ssl: "on"\n ssl_ca_file: /etc/ssl/certs/ssl-cert-snakeoil.pem\n ssl_cert_file: /etc/ssl/certs/ssl-cert-snakeoil.pem\n ssl_key_file: /etc/ssl/private/ssl-cert-snakeoil.key\n citus.node_conninfo: "sslrootcert=/etc/ssl/certs/ssl-cert-snakeoil.pem sslkey=/etc/ssl/private/ssl-cert-snakeoil.key sslcert=/etc/ssl/certs/ssl-cert-snakeoil.pem sslmode=verify-ca"|' postgres?.yml \
&& sed -i 's/^#\(ctl\| certfile\| keyfile\)/\1/' postgres?.yml \
&& sed -i 's|^# cafile: .*$| verify_client: required\n cafile: /etc/ssl/certs/ssl-cert-snakeoil.pem|' postgres?.yml \
&& sed -i 's|^# cacert: .*$| cacert: /etc/ssl/certs/ssl-cert-snakeoil.pem|' postgres?.yml \
&& sed -i 's/^# insecure: .*/ insecure: on/' postgres?.yml \
# client cert for HAProxy to access Patroni REST API
&& if [ "$COMPRESS" = "true" ]; then chmod u+s /usr/bin/sudo; fi \
&& chmod +s /bin/ping \
&& chown -R postgres:postgres $PGHOME /run /etc/haproxy

USER postgres

ENTRYPOINT ["/bin/sh", "/entrypoint.sh"]
2 changes: 2 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ We call Patroni a "template" because it is far from being a one-size-fits-all or

Currently supported PostgreSQL versions: 9.3 to 15.

**Note to Citus users**: Starting from 3.0 Patroni nicely integrates with `Citus <https://www.citusdata.com>`. Please check `Citus support <https://github.com/zalando/patroni/blob/master/docs/citus.rst>`__ page for more information.

**Note to Kubernetes users**: Patroni can run natively on top of Kubernetes. Take a look at the `Kubernetes <https://github.com/zalando/patroni/blob/master/docs/kubernetes.rst>`__ chapter of the Patroni documentation.

.. contents::
Expand Down
139 changes: 139 additions & 0 deletions docker-compose-citus.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# docker compose file for running a Citus cluster
# with 3-node etcd v3 cluster as the DCS and one haproxy node.
# The Citus cluster has a coordinator (3 nodes)
# and two worker clusters (2 nodes).
#
# Before starting it up you need to build the docker image:
# $ docker build -f Dockerfile.citus -t patroni-citus .
# The cluster could be started as:
# $ docker-compose -f docker-compose-citus.yml up -d
# You can read more about it in the:
# https://github.com/zalando/patroni/blob/master/docker/README.md#citus-cluster
version: "2"

networks:
demo:

services:
etcd1: &etcd
image: patroni-citus
networks: [ demo ]
environment:
ETCDCTL_API: 3
ETCD_LISTEN_PEER_URLS: http://0.0.0.0:2380
ETCD_LISTEN_CLIENT_URLS: http://0.0.0.0:2379
ETCD_INITIAL_CLUSTER: etcd1=http://etcd1:2380,etcd2=http://etcd2:2380,etcd3=http://etcd3:2380
ETCD_INITIAL_CLUSTER_STATE: new
ETCD_INITIAL_CLUSTER_TOKEN: tutorial
container_name: demo-etcd1
hostname: etcd1
command: etcd -name etcd1 -initial-advertise-peer-urls http://etcd1:2380

etcd2:
<<: *etcd
container_name: demo-etcd2
hostname: etcd2
command: etcd -name etcd2 -initial-advertise-peer-urls http://etcd2:2380

etcd3:
<<: *etcd
container_name: demo-etcd3
hostname: etcd3
command: etcd -name etcd3 -initial-advertise-peer-urls http://etcd3:2380

haproxy:
image: patroni-citus
networks: [ demo ]
env_file: docker/patroni.env
hostname: haproxy
container_name: demo-haproxy
ports:
- "5000:5000" # Access to the coorinator primary
- "5001:5001" # Load-balancing across workers primaries
command: haproxy
environment: &haproxy_env
ETCDCTL_API: 3
ETCDCTL_ENDPOINTS: http://etcd1:2379,http://etcd2:2379,http://etcd3:2379
PATRONI_ETCD3_HOSTS: "'etcd1:2379','etcd2:2379','etcd3:2379'"
PATRONI_SCOPE: demo
PATRONI_CITUS_GROUP: 0
PATRONI_CITUS_DATABASE: citus
PGSSLMODE: verify-ca
PGSSLKEY: /etc/ssl/private/ssl-cert-snakeoil.key
PGSSLCERT: /etc/ssl/certs/ssl-cert-snakeoil.pem
PGSSLROOTCERT: /etc/ssl/certs/ssl-cert-snakeoil.pem

coord1:
image: patroni-citus
networks: [ demo ]
env_file: docker/patroni.env
hostname: coord1
container_name: demo-coord1
environment: &coord_env
<<: *haproxy_env
PATRONI_NAME: coord1
PATRONI_CITUS_GROUP: 0

coord2:
image: patroni-citus
networks: [ demo ]
env_file: docker/patroni.env
hostname: coord2
container_name: demo-coord2
environment:
<<: *coord_env
PATRONI_NAME: coord2

coord3:
image: patroni-citus
networks: [ demo ]
env_file: docker/patroni.env
hostname: coord3
container_name: demo-coord3
environment:
<<: *coord_env
PATRONI_NAME: coord3


work1-1:
image: patroni-citus
networks: [ demo ]
env_file: docker/patroni.env
hostname: work1-1
container_name: demo-work1-1
environment: &work1_env
<<: *haproxy_env
PATRONI_NAME: work1-1
PATRONI_CITUS_GROUP: 1

work1-2:
image: patroni-citus
networks: [ demo ]
env_file: docker/patroni.env
hostname: work1-2
container_name: demo-work1-2
environment:
<<: *work1_env
PATRONI_NAME: work1-2


work2-1:
image: patroni-citus
networks: [ demo ]
env_file: docker/patroni.env
hostname: work2-1
container_name: demo-work2-1
environment: &work2_env
<<: *haproxy_env
PATRONI_NAME: work2-1
PATRONI_CITUS_GROUP: 2

work2-2:
image: patroni-citus
networks: [ demo ]
env_file: docker/patroni.env
hostname: work2-2
container_name: demo-work2-2
environment:
<<: *work2_env
PATRONI_NAME: work2-2
7 changes: 7 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# docker compose file for running a 3-node PostgreSQL cluster
# with 3-node etcd cluster as the DCS and one haproxy node
#
# requires a patroni image build from the Dockerfile:
# $ docker build -t patroni .
# The cluster could be started as:
# $ docker-compose up -d
# You can read more about it in the:
# https://github.com/zalando/patroni/blob/master/docker/README.md
version: "2"

networks:
Expand Down
Loading

0 comments on commit 4872ac5

Please sign in to comment.