From 8e96ed6d255e97b1a46b51d95a32e1bd155a9514 Mon Sep 17 00:00:00 2001 From: Siyuan Zhang Date: Mon, 13 May 2024 15:29:17 -0700 Subject: [PATCH] Add instructions for 3.5->3.4 downgrade. Signed-off-by: Siyuan Zhang --- content/en/docs/v3.5/downgrades/_index.md | 5 + .../en/docs/v3.5/downgrades/downgrade_3_5.md | 276 ++++++++++++++++++ .../docs/v3.5/downgrades/downgrading-etcd.md | 12 + .../en/docs/v3.5/upgrades/upgrading-etcd.md | 1 + .../en/docs/v3.6/upgrades/upgrading-etcd.md | 1 + 5 files changed, 295 insertions(+) create mode 100644 content/en/docs/v3.5/downgrades/_index.md create mode 100644 content/en/docs/v3.5/downgrades/downgrade_3_5.md create mode 100644 content/en/docs/v3.5/downgrades/downgrading-etcd.md diff --git a/content/en/docs/v3.5/downgrades/_index.md b/content/en/docs/v3.5/downgrades/_index.md new file mode 100644 index 000000000..7b4b9597d --- /dev/null +++ b/content/en/docs/v3.5/downgrades/_index.md @@ -0,0 +1,5 @@ +--- +title: Downgrading +weight: 6000 +description: Downgrading etcd clusters and applications +--- diff --git a/content/en/docs/v3.5/downgrades/downgrade_3_5.md b/content/en/docs/v3.5/downgrades/downgrade_3_5.md new file mode 100644 index 000000000..beed2ddf6 --- /dev/null +++ b/content/en/docs/v3.5/downgrades/downgrade_3_5.md @@ -0,0 +1,276 @@ +--- +title: Downgrade etcd from 3.5 to 3.4 +weight: 6650 +description: Processes, checklists, and notes on downgrading etcd from 3.5 to 3.4 +--- + +In the general case, downgrading from etcd 3.5 to 3.4 can be a zero-downtime, rolling downgrade: + - one by one, stop the etcd 3.5 processes and replace them with etcd 3.4 processes + - after starting any 3.4 processes, new features in 3.5 are not longer available to the cluster + +Before [starting a downgrade](#downgrade-procedure), read through the rest of this guide to prepare. + +### Downgrade checklists + +content/en/docs/v3.5/op-guide/authentication/rbac.md + +**NOTE:** If your cluster enables auth, rolling downgrade from 3.5 isn't supported because 3.5 [changes a format of WAL entries related to auth](https://github.com/etcd-io/etcd/pull/11943). You can follow the [authentification instructions](../../op-guide/authentication/rbac/) to disable auth, and delete all users first. + +Highlighted breaking changes from 3.5 to 3.4: + +#### Difference in flags + +If you are using any of the following flags in your 3.5 configurations, make sure to remove, rename, or change the default value when downgrading to 3.4. + +**NOTE** The diff is based on version 3.5.14 and v.3.4.33. The actual diff would be dependent on your patch version, check with `diff <(etcd-3.5/bin/etcd -h | grep \\-\\-) <(etcd-3.4/bin/etcd -h | grep \\-\\-)` first. + +```diff +# flags not available in 3.4 +-etcd --socket-reuse-port +-etcd --socket-reuse-address +-etcd --raft-read-timeout +-etcd --raft-write-timeout +-etcd --v2-deprecation +-etcd --client-cert-file +-etcd --client-key-file +-etcd --peer-client-cert-file +-etcd --peer-client-key-file +-etcd --self-signed-cert-validity +-etcd --enable-log-rotation --log-rotation-config-json=some.json +-etcd --experimental-enable-distributed-tracing --experimental-distributed-tracing-address='localhost:4317' --experimental-distributed-tracing-service-name='etcd' --experimental-distributed-tracing-instance-id='' --experimental-distributed-tracing-sampling-rate='0' +-etcd --experimental-compact-hash-check-enabled --experimental-compact-hash-check-time='1m' +-etcd --experimental-downgrade-check-time +-etcd --experimental-memory-mlock +-etcd --experimental-txn-mode-write-with-shared-buffer +-etcd --experimental-bootstrap-defrag-threshold-megabytes +-etcd --experimental-stop-grpc-service-on-defrag + +# same flag with different names +-etcd --backend-bbolt-freelist-type=map ++etcd --experimental-backend-bbolt-freelist-type=array + +# same flag different defaults +-etcd --pre-vote=true ++etcd --pre-vote=false + +-etcd --logger=zap ++etcd --logger=capnslog +``` + +#### `etcd --logger zap` + +3.4 defaults to `--logger=capnslog` while 3.5 defaults `--logger=zap`. + +If you want to keep using `zap`, it needs to be explicitly specified. + +```diff ++etcd --logger=zap --log-outputs=stderr + ++# to write logs to stderr and a.log file at the same time ++etcd --logger=zap --log-outputs=stderr,a.log +``` + +#### Difference in Prometheus metrics + +```diff +# metrics not available in 3.4 +-etcd_debugging_mvcc_db_compaction_last +``` + +### Server downgrade checklists + +#### Downgrade requirements + +To ensure a smooth rolling downgrade, the running cluster must be healthy. Check the health of the cluster by using the `etcdctl endpoint health` command before proceeding. + +The 3.4 version to downgrade to must be >= 3.4.32. + +#### Preparation + +Before downgrading etcd, always test the services relying on etcd in a staging environment before deploying the downgrade to the production environment. + +Before beginning, [download the snapshot backup](../../op-guide/maintenance/#snapshot-backup). Should something go wrong with the downgrade, it is possible to use this backup to [rollback](#rollback) back to existing etcd version. Please note that the `snapshot` command only backs up the v3 data. For v2 data, see [backing up v2 datastore](/docs/v2.3/admin_guide#backing-up-the-datastore). + +Before beginning, download the latest release of etcd 3.4, and make sure its version is >= 3.4.32. + +#### Mixed versions + +While downgrading, an etcd cluster supports mixed versions of etcd members, and operates with the protocol of the lowest common version. The cluster is considered downgraded once any of its members is downgraded to version 3.4. Internally, etcd members negotiate with each other to determine the overall cluster version, which controls the reported version and the supported features. + +#### Limitations + +Note: If the cluster only has v3 data and no v2 data, it is not subject to this limitation. + +If the cluster is serving a v2 data set larger than 50MB, each newly downgraded member may take up to two minutes to catch up with the existing cluster. Check the size of a recent snapshot to estimate the total data size. In other words, it is safest to wait for 2 minutes between downgrading each member. + +For a much larger total data size, 100MB or more , this one-time process might take even more time. Administrators of very large etcd clusters of this magnitude can feel free to contact the [etcd team][etcd-contact] before downgrading, and we'll be happy to provide advice on the procedure. + +#### Rollback + +If any member has been downgraded to 3.4, the cluster version will be downgraded to 3.4, and operations will be "3.4" compatible. You would need to follow the [Upgrade etcd from 3.4 to 3.5](../../upgrades/upgrade_3_5/) instructions to rollback. + +Please [download the snapshot backup](../../op-guide/maintenance/#snapshot-backup) to make downgrading the cluster possible even after it has been completely downgraded. + +### Downgrade procedure + +This example shows how to downgrade a 3-member 3.5 etcd cluster running on a local machine. + +#### Step 1: check downgrade requirements + +Is the cluster healthy and running 3.5.x? + +```bash +etcdctl --endpoints=localhost:2379,localhost:22379,localhost:32379 endpoint health +< `{"level":"info","ts":"2024-05-13T21:05:43.981445Z","caller":"membership/cluster.go:561","msg":"set initial cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"8211f1d0f64f3269","cluster-version":"3.0"}` + +> `{"level":"info","ts":"2024-05-13T21:05:43.982188Z","caller":"api/capability.go:77","msg":"enabled capabilities for version","cluster-version":"3.0"}` + +> `{"level":"info","ts":"2024-05-13T21:05:43.982312Z","caller":"membership/cluster.go:549","msg":"updated cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"8211f1d0f64f3269","from":"3.0","from":"3.5"}` + +> `{"level":"info","ts":"2024-05-13T21:05:43.982376Z","caller":"api/capability.go:77","msg":"enabled capabilities for version","cluster-version":"3.5"}` + +> `{"level":"info","ts":"2024-05-13T21:05:44.000672Z","caller":"etcdserver/server.go:2152","msg":"published local member to cluster through raft","local-member-id":"8211f1d0f64f3269","local-member-attributes":"{Name:infra1 ClientURLs:[http://127.0.0.1:2379]}","request-path":"/0/members/8211f1d0f64f3269/attributes","cluster-id":"ef37ad9dc622a7c4","publish-timeout":"7s"}` + +> `{"level":"info","ts":"2024-05-13T21:05:46.452631Z","caller":"membership/cluster.go:549","msg":"updated cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"8211f1d0f64f3269","from":"3.5","from":"3.4"}` + +Verify that each member, and then the entire cluster, becomes healthy with the new 3.4 etcd binary: + +```bash +etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379 +<