forked from patroni/patroni
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Citus cluster (coordinator and workers) will be stored in DCS as a fleet of Patroni logically grouped together: ``` /service/batman/ /service/batman/0/ /service/batman/0/initialize /service/batman/0/leader /service/batman/0/members/ /service/batman/0/members/m1 /service/batman/0/members/m2 /service/batman/ /service/batman/1/ /service/batman/1/initialize /service/batman/1/leader /service/batman/1/members/ /service/batman/1/members/m1 /service/batman/1/members/m2 ... ``` Where 0 is a Citus group for coordinator and 1, 2, etc are worker groups. Such hierarchy allows reading the entire Citus cluster with a single call to DCS (except Zookeeper). The get_cluster() method will be reading the entire Citus cluster on the coordinator because it needs to discover workers. For the worker cluster it will be reading the subtree of its own group. Besides that we introduce a new method get_citus_coordinator(). It will be used only by worker clusters. Since there is no hierarchical structures on K8s we will use the citus group suffix on all objects that Patroni creates. E.g. ``` batman-0-leader # the leader config map for the coordinator batman-0-config # the config map holding initialize, config, and history "keys" ... batman-1-leader # the leader config map for worker group 1 batman-1-config ... ``` Citus integration is enabled from patroni.yaml: ```yaml citus: database: citus group: 0 # 0 is for coordinator, 1, 2, etc are for workers ``` If enabled, Patroni will create the database, citus extension in it, and INSERTs INTO `pg_dist_authinfo` information required for Citus nodes to communicate between each other, i.e. 'password', 'sslcert', 'sslkey' for superuser if they are defined in the Patroni configuration file. When the new Citus coordinator/worker is bootstrapped, Patroni adds `synchronous_mode: on` to the `bootstrap.dcs` section. Besides that, Patroni takes over management of some Postgres GUCs: - `shared_preload_libraries` - Patroni ensures that the "citus" is added to the first place - `max_prepared_transactions` - if not set or set to 0, Patroni changes the value to `max_connections*2` - wal_level - automatically set to logical. It is used by Citus to move/split shards. Under the hood Citus is creating/removing replication slots and they are automatically added by Patroni to the `ignore_slots` configuration to avoid accidental removal. The coordinator primary actively discovers worker primary nodes and registers/updates them in the `pg_dist_node` table using citus_add_node() and citus_update_node() functions. Patroni running on the coordinator provides the new REST API endpoint: `POST /citus`. It is used by workers to facilitate controlled switchovers and restarts of worker primaries. When the worker primary needs to shut down Postgres because of restart or switchover, it calls the `POST /citus` endpoint on the coordinator and the Patroni on the coordinator starts a transaction and calls `citus_update_node(nodeid, 'host-demoted', port)` in order to pause client connections that work with the given worker. Once the new leader is elected or postgres started back, they perform another call to the `POST/citus` endpoint, that does another `citus_update_node()` call with actual hostname and port and commits a transaction. After transaction is committed, coordinator reestablishes connections to the worker node and client connections are unblocked. If clients don't run long transaction the operation finishes without client visible errors, but only a short latency spike. All operations on the `pg_dist_node` are serialized by Patroni on the coordinator. It allows to have more control and ROLLBACK transaction in progress if its lifetime exceeding a certain threshold and there are other worker nodes should be updated.
- Loading branch information
1 parent
3161f31
commit 4872ac5
Showing
54 changed files
with
3,590 additions
and
603 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
## This Dockerfile is meant to aid in the building and debugging patroni whilst developing on your local machine | ||
## It has all the necessary components to play/debug with a single node appliance, running etcd | ||
ARG PG_MAJOR=15 | ||
ARG COMPRESS=false | ||
ARG PGHOME=/home/postgres | ||
ARG PGDATA=$PGHOME/data | ||
ARG LC_ALL=C.UTF-8 | ||
ARG LANG=C.UTF-8 | ||
|
||
FROM postgres:$PG_MAJOR as builder | ||
|
||
ARG PGHOME | ||
ARG PGDATA | ||
ARG LC_ALL | ||
ARG LANG | ||
|
||
ENV ETCDVERSION=3.3.13 CONFDVERSION=0.16.0 | ||
|
||
RUN set -ex \ | ||
&& export DEBIAN_FRONTEND=noninteractive \ | ||
&& echo 'APT::Install-Recommends "0";\nAPT::Install-Suggests "0";' > /etc/apt/apt.conf.d/01norecommend \ | ||
&& apt-get update -y \ | ||
# postgres:10 is based on debian, which has the patroni package. We will install all required dependencies | ||
&& apt-cache depends patroni | sed -n -e 's/.*Depends: \(python3-.\+\)$/\1/p' \ | ||
| grep -Ev '^python3-(sphinx|etcd|consul|kazoo|kubernetes)' \ | ||
| xargs apt-get install -y vim curl less jq locales haproxy sudo \ | ||
python3-etcd python3-kazoo python3-pip busybox \ | ||
net-tools iputils-ping --fix-missing \ | ||
&& curl https://install.citusdata.com/community/deb.sh | bash \ | ||
&& apt-get -y install postgresql-$PG_MAJOR-citus-11.1 \ | ||
&& pip3 install dumb-init \ | ||
\ | ||
# Cleanup all locales but en_US.UTF-8 | ||
&& find /usr/share/i18n/charmaps/ -type f ! -name UTF-8.gz -delete \ | ||
&& find /usr/share/i18n/locales/ -type f ! -name en_US ! -name en_GB ! -name i18n* ! -name iso14651_t1 ! -name iso14651_t1_common ! -name 'translit_*' -delete \ | ||
&& echo 'en_US.UTF-8 UTF-8' > /usr/share/i18n/SUPPORTED \ | ||
\ | ||
# Make sure we have a en_US.UTF-8 locale available | ||
&& localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8 \ | ||
\ | ||
# haproxy dummy config | ||
&& echo 'global\n stats socket /run/haproxy/admin.sock mode 660 level admin' > /etc/haproxy/haproxy.cfg \ | ||
\ | ||
# vim config | ||
&& echo 'syntax on\nfiletype plugin indent on\nset mouse-=a\nautocmd FileType yaml setlocal ts=2 sts=2 sw=2 expandtab' > /etc/vim/vimrc.local \ | ||
\ | ||
# Prepare postgres/patroni/haproxy environment | ||
&& mkdir -p $PGHOME/.config/patroni /patroni /run/haproxy \ | ||
&& ln -s ../../postgres0.yml $PGHOME/.config/patroni/patronictl.yaml \ | ||
&& ln -s /patronictl.py /usr/local/bin/patronictl \ | ||
&& sed -i "s|/var/lib/postgresql.*|$PGHOME:/bin/bash|" /etc/passwd \ | ||
&& chown -R postgres:postgres /var/log \ | ||
\ | ||
# Download etcd | ||
&& curl -sL https://github.com/coreos/etcd/releases/download/v${ETCDVERSION}/etcd-v${ETCDVERSION}-linux-$(dpkg --print-architecture).tar.gz \ | ||
| tar xz -C /usr/local/bin --strip=1 --wildcards --no-anchored etcd etcdctl \ | ||
\ | ||
# Download confd | ||
&& curl -sL https://github.com/kelseyhightower/confd/releases/download/v${CONFDVERSION}/confd-${CONFDVERSION}-linux-$(dpkg --print-architecture) \ | ||
> /usr/local/bin/confd && chmod +x /usr/local/bin/confd \ | ||
# Prepare client cert for HAProxy | ||
&& cat /etc/ssl/private/ssl-cert-snakeoil.key /etc/ssl/certs/ssl-cert-snakeoil.pem > /etc/ssl/private/ssl-cert-snakeoil.crt \ | ||
\ | ||
# Clean up all useless packages and some files | ||
&& apt-get purge -y --allow-remove-essential python3-pip gzip bzip2 util-linux e2fsprogs \ | ||
libmagic1 bsdmainutils login ncurses-bin libmagic-mgc e2fslibs bsdutils \ | ||
exim4-config gnupg-agent dirmngr libpython2.7-stdlib libpython2.7-minimal \ | ||
&& apt-get autoremove -y \ | ||
&& apt-get clean -y \ | ||
&& rm -rf /var/lib/apt/lists/* \ | ||
/root/.cache \ | ||
/var/cache/debconf/* \ | ||
/etc/rc?.d \ | ||
/etc/systemd \ | ||
/docker-entrypoint* \ | ||
/sbin/pam* \ | ||
/sbin/swap* \ | ||
/sbin/unix* \ | ||
/usr/local/bin/gosu \ | ||
/usr/sbin/[acgipr]* \ | ||
/usr/sbin/*user* \ | ||
/usr/share/doc* \ | ||
/usr/share/man \ | ||
/usr/share/info \ | ||
/usr/share/i18n/locales/translit_hangul \ | ||
/usr/share/locale/?? \ | ||
/usr/share/locale/??_?? \ | ||
/usr/share/postgresql/*/man \ | ||
/usr/share/postgresql-common/pg_wrapper \ | ||
/usr/share/vim/vim80/doc \ | ||
/usr/share/vim/vim80/lang \ | ||
/usr/share/vim/vim80/tutor \ | ||
# /var/lib/dpkg/info/* \ | ||
&& find /usr/bin -xtype l -delete \ | ||
&& find /var/log -type f -exec truncate --size 0 {} \; \ | ||
&& find /usr/lib/python3/dist-packages -name '*test*' | xargs rm -fr \ | ||
&& find /lib/$(uname -m)-linux-gnu/security -type f ! -name pam_env.so ! -name pam_permit.so ! -name pam_unix.so -delete | ||
|
||
# perform compression if it is necessary | ||
ARG COMPRESS | ||
RUN if [ "$COMPRESS" = "true" ]; then \ | ||
set -ex \ | ||
# Allow certain sudo commands from postgres | ||
&& echo 'postgres ALL=(ALL) NOPASSWD: /bin/tar xpJf /a.tar.xz -C /, /bin/rm /a.tar.xz, /bin/ln -snf dash /bin/sh' >> /etc/sudoers \ | ||
&& ln -snf busybox /bin/sh \ | ||
&& arch=$(uname -m) \ | ||
&& darch=$(uname -m | sed 's/_/-/') \ | ||
&& files="/bin/sh /usr/bin/sudo /usr/lib/sudo/sudoers.so /lib/$arch-linux-gnu/security/pam_*.so" \ | ||
&& libs="$(ldd $files | awk '{print $3;}' | grep '^/' | sort -u) /lib/ld-linux-$darch.so.* /lib/$arch-linux-gnu/ld-linux-$darch.so.* /lib/$arch-linux-gnu/libnsl.so.* /lib/$arch-linux-gnu/libnss_compat.so.* /lib/$arch-linux-gnu/libnss_files.so.*" \ | ||
&& (echo /var/run $files $libs | tr ' ' '\n' && realpath $files $libs) | sort -u | sed 's/^\///' > /exclude \ | ||
&& find /etc/alternatives -xtype l -delete \ | ||
&& save_dirs="usr lib var bin sbin etc/ssl etc/init.d etc/alternatives etc/apt" \ | ||
&& XZ_OPT=-e9v tar -X /exclude -cpJf a.tar.xz $save_dirs \ | ||
# we call "cat /exclude" to avoid including files from the $save_dirs that are also among | ||
# the exceptions listed in the /exclude, as "uniq -u" eliminates all non-unique lines. | ||
# By calling "cat /exclude" a second time we guarantee that there will be at least two lines | ||
# for each exception and therefore they will be excluded from the output passed to 'rm'. | ||
&& /bin/busybox sh -c "(find $save_dirs -not -type d && cat /exclude /exclude && echo exclude) | sort | uniq -u | xargs /bin/busybox rm" \ | ||
&& /bin/busybox --install -s \ | ||
&& /bin/busybox sh -c "find $save_dirs -type d -depth -exec rmdir -p {} \; 2> /dev/null"; \ | ||
else \ | ||
/bin/busybox --install -s; \ | ||
fi | ||
|
||
FROM scratch | ||
COPY --from=builder / / | ||
|
||
LABEL maintainer="Alexander Kukushkin <[email protected]>" | ||
|
||
ARG PG_MAJOR | ||
ARG COMPRESS | ||
ARG PGHOME | ||
ARG PGDATA | ||
ARG LC_ALL | ||
ARG LANG | ||
|
||
ARG PGBIN=/usr/lib/postgresql/$PG_MAJOR/bin | ||
|
||
ENV LC_ALL=$LC_ALL LANG=$LANG EDITOR=/usr/bin/editor | ||
ENV PGDATA=$PGDATA PATH=$PATH:$PGBIN | ||
|
||
COPY patroni /patroni/ | ||
COPY extras/confd/conf.d/haproxy.toml /etc/confd/conf.d/ | ||
COPY extras/confd/templates/haproxy-citus.tmpl /etc/confd/templates/haproxy.tmpl | ||
COPY patroni*.py docker/entrypoint.sh / | ||
COPY postgres?.yml $PGHOME/ | ||
|
||
WORKDIR $PGHOME | ||
|
||
RUN sed -i 's/env python/&3/' /patroni*.py \ | ||
# "fix" patroni configs | ||
&& sed -i 's/^\( connect_address:\| - host\)/#&/' postgres?.yml \ | ||
&& sed -i 's/^ listen: 127.0.0.1/ listen: 0.0.0.0/' postgres?.yml \ | ||
&& sed -i "s|^\( data_dir: \).*|\1$PGDATA|" postgres?.yml \ | ||
&& sed -i "s|^#\( bin_dir: \).*|\1$PGBIN|" postgres?.yml \ | ||
&& sed -i 's/^ - encoding: UTF8/ - locale: en_US.UTF-8\n&/' postgres?.yml \ | ||
&& sed -i 's/^scope:/log:\n loggers:\n patroni.postgresql.citus: DEBUG\n#&/' postgres?.yml \ | ||
&& sed -i 's/^\(name\|etcd\| host\| authentication\| pg_hba\| parameters\):/#&/' postgres?.yml \ | ||
&& sed -i 's/^ \(replication\|superuser\|rewind\|unix_socket_directories\|\(\( \)\{0,1\}\(username\|password\)\)\):/#&/' postgres?.yml \ | ||
&& sed -i 's/^postgresql:/&\n basebackup:\n checkpoint: fast/' postgres?.yml \ | ||
&& sed -i 's|^ parameters:| pg_hba:\n - local all all trust\n - hostssl replication all all md5 clientcert=verify-ca\n - hostssl all all all md5 clientcert=verify-ca\n&\n max_connections: 100\n shared_buffers: 16MB\n ssl: "on"\n ssl_ca_file: /etc/ssl/certs/ssl-cert-snakeoil.pem\n ssl_cert_file: /etc/ssl/certs/ssl-cert-snakeoil.pem\n ssl_key_file: /etc/ssl/private/ssl-cert-snakeoil.key\n citus.node_conninfo: "sslrootcert=/etc/ssl/certs/ssl-cert-snakeoil.pem sslkey=/etc/ssl/private/ssl-cert-snakeoil.key sslcert=/etc/ssl/certs/ssl-cert-snakeoil.pem sslmode=verify-ca"|' postgres?.yml \ | ||
&& sed -i 's/^#\(ctl\| certfile\| keyfile\)/\1/' postgres?.yml \ | ||
&& sed -i 's|^# cafile: .*$| verify_client: required\n cafile: /etc/ssl/certs/ssl-cert-snakeoil.pem|' postgres?.yml \ | ||
&& sed -i 's|^# cacert: .*$| cacert: /etc/ssl/certs/ssl-cert-snakeoil.pem|' postgres?.yml \ | ||
&& sed -i 's/^# insecure: .*/ insecure: on/' postgres?.yml \ | ||
# client cert for HAProxy to access Patroni REST API | ||
&& if [ "$COMPRESS" = "true" ]; then chmod u+s /usr/bin/sudo; fi \ | ||
&& chmod +s /bin/ping \ | ||
&& chown -R postgres:postgres $PGHOME /run /etc/haproxy | ||
|
||
USER postgres | ||
|
||
ENTRYPOINT ["/bin/sh", "/entrypoint.sh"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
# docker compose file for running a Citus cluster | ||
# with 3-node etcd v3 cluster as the DCS and one haproxy node. | ||
# The Citus cluster has a coordinator (3 nodes) | ||
# and two worker clusters (2 nodes). | ||
# | ||
# Before starting it up you need to build the docker image: | ||
# $ docker build -f Dockerfile.citus -t patroni-citus . | ||
# The cluster could be started as: | ||
# $ docker-compose -f docker-compose-citus.yml up -d | ||
# You can read more about it in the: | ||
# https://github.com/zalando/patroni/blob/master/docker/README.md#citus-cluster | ||
version: "2" | ||
|
||
networks: | ||
demo: | ||
|
||
services: | ||
etcd1: &etcd | ||
image: patroni-citus | ||
networks: [ demo ] | ||
environment: | ||
ETCDCTL_API: 3 | ||
ETCD_LISTEN_PEER_URLS: http://0.0.0.0:2380 | ||
ETCD_LISTEN_CLIENT_URLS: http://0.0.0.0:2379 | ||
ETCD_INITIAL_CLUSTER: etcd1=http://etcd1:2380,etcd2=http://etcd2:2380,etcd3=http://etcd3:2380 | ||
ETCD_INITIAL_CLUSTER_STATE: new | ||
ETCD_INITIAL_CLUSTER_TOKEN: tutorial | ||
container_name: demo-etcd1 | ||
hostname: etcd1 | ||
command: etcd -name etcd1 -initial-advertise-peer-urls http://etcd1:2380 | ||
|
||
etcd2: | ||
<<: *etcd | ||
container_name: demo-etcd2 | ||
hostname: etcd2 | ||
command: etcd -name etcd2 -initial-advertise-peer-urls http://etcd2:2380 | ||
|
||
etcd3: | ||
<<: *etcd | ||
container_name: demo-etcd3 | ||
hostname: etcd3 | ||
command: etcd -name etcd3 -initial-advertise-peer-urls http://etcd3:2380 | ||
|
||
haproxy: | ||
image: patroni-citus | ||
networks: [ demo ] | ||
env_file: docker/patroni.env | ||
hostname: haproxy | ||
container_name: demo-haproxy | ||
ports: | ||
- "5000:5000" # Access to the coorinator primary | ||
- "5001:5001" # Load-balancing across workers primaries | ||
command: haproxy | ||
environment: &haproxy_env | ||
ETCDCTL_API: 3 | ||
ETCDCTL_ENDPOINTS: http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 | ||
PATRONI_ETCD3_HOSTS: "'etcd1:2379','etcd2:2379','etcd3:2379'" | ||
PATRONI_SCOPE: demo | ||
PATRONI_CITUS_GROUP: 0 | ||
PATRONI_CITUS_DATABASE: citus | ||
PGSSLMODE: verify-ca | ||
PGSSLKEY: /etc/ssl/private/ssl-cert-snakeoil.key | ||
PGSSLCERT: /etc/ssl/certs/ssl-cert-snakeoil.pem | ||
PGSSLROOTCERT: /etc/ssl/certs/ssl-cert-snakeoil.pem | ||
|
||
coord1: | ||
image: patroni-citus | ||
networks: [ demo ] | ||
env_file: docker/patroni.env | ||
hostname: coord1 | ||
container_name: demo-coord1 | ||
environment: &coord_env | ||
<<: *haproxy_env | ||
PATRONI_NAME: coord1 | ||
PATRONI_CITUS_GROUP: 0 | ||
|
||
coord2: | ||
image: patroni-citus | ||
networks: [ demo ] | ||
env_file: docker/patroni.env | ||
hostname: coord2 | ||
container_name: demo-coord2 | ||
environment: | ||
<<: *coord_env | ||
PATRONI_NAME: coord2 | ||
|
||
coord3: | ||
image: patroni-citus | ||
networks: [ demo ] | ||
env_file: docker/patroni.env | ||
hostname: coord3 | ||
container_name: demo-coord3 | ||
environment: | ||
<<: *coord_env | ||
PATRONI_NAME: coord3 | ||
|
||
|
||
work1-1: | ||
image: patroni-citus | ||
networks: [ demo ] | ||
env_file: docker/patroni.env | ||
hostname: work1-1 | ||
container_name: demo-work1-1 | ||
environment: &work1_env | ||
<<: *haproxy_env | ||
PATRONI_NAME: work1-1 | ||
PATRONI_CITUS_GROUP: 1 | ||
|
||
work1-2: | ||
image: patroni-citus | ||
networks: [ demo ] | ||
env_file: docker/patroni.env | ||
hostname: work1-2 | ||
container_name: demo-work1-2 | ||
environment: | ||
<<: *work1_env | ||
PATRONI_NAME: work1-2 | ||
|
||
|
||
work2-1: | ||
image: patroni-citus | ||
networks: [ demo ] | ||
env_file: docker/patroni.env | ||
hostname: work2-1 | ||
container_name: demo-work2-1 | ||
environment: &work2_env | ||
<<: *haproxy_env | ||
PATRONI_NAME: work2-1 | ||
PATRONI_CITUS_GROUP: 2 | ||
|
||
work2-2: | ||
image: patroni-citus | ||
networks: [ demo ] | ||
env_file: docker/patroni.env | ||
hostname: work2-2 | ||
container_name: demo-work2-2 | ||
environment: | ||
<<: *work2_env | ||
PATRONI_NAME: work2-2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.