Ensure strict failover/switchover definition difference (patroni#2784)

- Don't set leader in failover key from patronictl failover - Show warning and execute switchover if leader option is provided for patronictl failover command - Be more precise in the log messages - Allow to failover to an async candidate in sync mode - Check if candidate is the same as the leader specified in api - Fix and extend some tests - Add documentation
EnterpriseDB · Sep 12, 2023 · b31a4d5 · b31a4d5
1 parent 3c24c33
commit b31a4d5
Show file tree

Hide file tree

Showing 8 changed files with 673 additions and 262 deletions.
diff --git a/docs/pause.rst b/docs/pause.rst
@@ -19,7 +19,7 @@ When Patroni runs in a paused mode, it does not change the state of PostgreSQL,
 
 - For the Postgres primary with the leader lock Patroni updates the lock. If the node with the leader lock stops being the primary (i.e. is demoted manually), Patroni will release the lock instead of promoting the node back.
 
-- Manual unscheduled restart, reinitialize and manual failover are allowed. Manual failover is only allowed if the node to failover to is specified. In the paused mode, manual failover does not require a running primary node.
+- Manual unscheduled restart, manual unscheduled failover/switchover and reinitialize are allowed. No scheduled action is allowed. Manual switchover is only allowed if the node to switch over to is specified.
 
 - If 'parallel' primaries are detected by Patroni, it emits a warning, but does not demote the primary without the leader lock.
 

diff --git a/docs/rest_api.rst b/docs/rest_api.rst
@@ -560,39 +560,111 @@ The above call removes ``postgresql.parameters.max_connections`` from the dynami
 Switchover and failover endpoints
 ---------------------------------
 
-``POST /switchover`` or ``POST /failover``. These endpoints are very similar to each other. There are a couple of minor differences though:
+.. _switchover_api:
 
-1. The failover endpoint allows to perform a manual failover when there are no healthy nodes, but at the same time it will not allow you to schedule a switchover.
+Switchover
+^^^^^^^^^^
 
-2. The switchover endpoint is the opposite. It works only when the cluster is healthy (there is a leader) and allows to schedule a switchover at a given time.
+``/switchover`` endpoint only works when the cluster is healthy (there is a leader). It also allows to schedule a switchover at a given time.
 
+When calling ``/switchover`` endpoint a candidate can be specified but is not required, in contrast to ``/failover`` endpoint. If a candidate is not provided, all the eligible nodes of the cluster will participate in the leader race after the leader stepped down.
 
-In the JSON body of the ``POST`` request you must specify at least the ``leader`` or ``candidate`` fields and optionally the ``scheduled_at`` field if you want to schedule a switchover at a specific time.
+In the JSON body of the ``POST`` request you must specify the ``leader`` field. The ``candidate`` and the ``scheduled_at`` fields are optional and can be used to schedule a switchover at a specific time.
 
+Depending on the situation, requests might return different HTTP status codes and bodies. Status code **200** is returned when the switchover or failover successfully completed. If the switchover was successfully scheduled, Patroni will return HTTP status code **202**. In case something went wrong, the error status code (one of **400**, **412**, or **503**) will be returned with some details in the response body.
 
-Example: perform a failover to the specific node:
+``DELETE /switchover`` can be used to delete the currently scheduled switchover.
+
+**Example:** perform a switchover to any healthy standby
+
+.. code-block:: bash
+
+	$ curl -s http://localhost:8008/switchover -XPOST -d '{"leader":"postgresql1"}'
+	Successfully switched over to "postgresql2"
+
+
+**Example:** perform a switchover to a specific node
+
+.. code-block:: bash
+
+	$ curl -s http://localhost:8008/switchover -XPOST -d \
+		'{"leader":"postgresql1","candidate":"postgresql2"}'
+	Successfully switched over to "postgresql2"
+
+
+**Example:** schedule a switchover from the leader to any other healthy standby in the cluster at a specific time.
 
 .. code-block:: bash
 
-    $ curl -s http://localhost:8009/failover -XPOST -d '{"candidate":"postgresql1"}'
-    Successfully failed over to "postgresql1"
+	$ curl -s http://localhost:8008/switchover -XPOST -d \
+		'{"leader":"postgresql0","scheduled_at":"2019-09-24T12:00+00"}'
+	Switchover scheduled
 
 
-Example: schedule a switchover from the leader to any other healthy replica in the cluster at a specific time:
+Failover
+^^^^^^^^
+
+``/failover`` endpoint can be used to perform a manual failover when there are no healthy nodes (e.g. to an asynchronous standby if all synchronous standbys are not healthy enough to promote). However there is no requirement for a cluster not to have leader - failover can also be run on a healthy cluster.
+
+In the JSON body of the ``POST`` request you must specify the ``candidate`` field. If the ``leader`` field is specified, a switchover is triggered instead.
+
+**Example:**
 
 .. code-block:: bash
 
-    $ curl -s http://localhost:8008/switchover -XPOST -d \
-	    '{"leader":"postgresql0","scheduled_at":"2019-09-24T12:00+00"}'
-    Switchover scheduled
+	$ curl -s http://localhost:8008/failover -XPOST -d '{"candidate":"postgresql1"}'
+	Successfully failed over to "postgresql1"
+
+.. warning::
+	:ref:`Be very careful <failover_healthcheck>` when using this endpoint, as this can cause data loss in certain situations. In most cases, :ref:`the switchover endpoint <switchover_api>` satisfies the administrator's needs. 
+
+
+``POST /switchover`` and ``POST /failover`` endpoints are used by ``patronictl switchover`` and ``patronictl failover``, respectively.
+
+``DELETE /switchover`` is used by ``patronictl flush <cluster-name> switchover``.
+
+.. list-table:: Failover/Switchover comparison
+   :widths: 25 25 25
+   :header-rows: 1
+
+   * -
+     - Failover
+     - Switchover
+   * - Requires leader specified
+     - no
+     - yes
+   * - Requires candidate specified
+     - yes
+     - no
+   * - Can be run in pause
+     - yes
+     - yes (only to a specific candidate)
+   * - Can be scheduled
+     - no
+     - yes (if not in pause)
+
+.. _failover_healthcheck:
+
+Healthy standby
+^^^^^^^^^^^^^^^
 
+There are a couple of checks that a member of a cluster should pass to be able to participate in the leader race during a switchover or to become a leader as a failover/switchover candidate:
 
-Depending on the situation the request might finish with a different HTTP status code and body. The status code **200** is returned when the switchover or failover successfully completed. If the switchover was successfully scheduled, Patroni will return HTTP status code **202**. In case something went wrong, the error status code (one of **400**, **412** or **503**) will be returned with some details in the response body. For more information please check the source code of ``patroni/api.py:do_POST_failover()`` method.
+- be reachable via Patroni API;
+- not have ``nofailover`` tag set to ``true``;
+- have watchdog fully functional (if required by the configuration);
+- in case of a switchover in a healthy cluster or an automatic failover, not exceed maximum replication lag (``maximum_lag_on_failover`` :ref:`configuration parameter <dynamic_configuration>`);
+- in case of a switchover in a healthy cluster or an automatic failover, not have a timeline number smaller than the cluster timeline if ``check_timeline`` :ref:`configuration parameter <dynamic_configuration>` is set to ``true``;
+- in :ref:`synchronous mode <synchronous_mode>`:
 
-- ``DELETE /switchover``: delete the scheduled switchover
+  - In case of a switchover (both with and without a candidate): be listed in the ``/sync`` key members;
+  - For a failover in both healthy and unhealthy clusters, this check is omitted.
 
-The ``POST /switchover`` and ``POST failover`` endpoints are used by ``patronictl switchover`` and ``patronictl failover``, respectively.
-The ``DELETE /switchover`` is used by ``patronictl flush <cluster-name> switchover``.
+.. warning::
+    In case of a manual failover in a cluster without a leader, a candidate will be allowed to promote even if:
+	- it is not in the ``/sync`` key members when synchronous mode is enabled;
+	- its lag exceeds the maximum replication lag allowed;
+	- it has the timeline number smaller than the last known cluster timeline.
 
 
 Restart endpoint

diff --git a/patroni/api.py b/patroni/api.py
@@ -1074,7 +1074,8 @@ def do_POST_failover(self, action: str = 'failover') -> None:
             * ``412``: if operation is not possible;
             * ``503``: if unable to register the operation to the DCS;
             * HTTP status returned by :func:`parse_schedule`, if any error was observed while parsing the schedule;
-            * HTTP status returned by :func:`poll_failover_result` if the operation has been processed immediately.
+            * HTTP status returned by :func:`poll_failover_result` if the operation has been processed immediately;
+            * ``400``: if none of the above applies.
 
         .. note::
             If unable to parse the request body, then the request is silently discarded.
@@ -1101,15 +1102,22 @@ def do_POST_failover(self, action: str = 'failover') -> None:
             data = 'Switchover could be performed only from a specific leader'
 
         if not data and scheduled_at:
-            if not leader:
-                data = 'Scheduled {0} is possible only from a specific leader'.format(action)
-            if not data and global_config.is_paused:
-                data = "Can't schedule {0} in the paused state".format(action)
-            if not data:
+            if action == 'failover':
+                data = "Failover can't be scheduled"
+            elif global_config.is_paused:
+                data = "Can't schedule switchover in the paused state"
+            else:
                 (status_code, data, scheduled_at) = self.parse_schedule(scheduled_at, action)
 
         if not data and global_config.is_paused and not candidate:
-            data = action.title() + ' is possible only to a specific candidate in a paused state'
+            data = 'Switchover is possible only to a specific candidate in a paused state'
+
+        if action == 'failover' and leader:
+            logger.warning('received failover request with leader specifed - performing switchover instead')
+            action = 'switchover'
+
+        if not data and leader == candidate:
+            data = 'Switchover target and source are the same'
 
         if not data and not scheduled_at:
             data = self.is_failover_possible(cluster, leader, candidate, action)
@@ -1126,7 +1134,7 @@ def do_POST_failover(self, action: str = 'failover') -> None:
                     status_code, data = self.poll_failover_result(cluster.leader and cluster.leader.name,
                                                                   candidate, action)
             else:
-                data = 'failed to write {0} key into DCS'.format(action)
+                data = 'failed to write failover key into DCS'
                 status_code = 503
         # pyright thinks ``status_code`` can be ``None`` because ``parse_schedule`` call may return ``None``. However,
         # if that's the case, ``status_code`` will be overwritten somewhere between ``parse_schedule`` and

diff --git a/patroni/ctl.py b/patroni/ctl.py
@@ -46,6 +46,7 @@
 except ImportError:  # pragma: no cover
     from cdiff import markup_to_pager, PatchStream  # pyright: ignore [reportMissingModuleSource]
 
+from .config import Config, get_global_config
 from .dcs import get_dcs as _get_dcs, AbstractDCS, Cluster, Member
 from .exceptions import PatroniException
 from .postgresql.misc import postgres_version_to_int
@@ -225,8 +226,6 @@ def load_config(path: str, dcs_url: Optional[str]) -> Dict[str, Any]:
     :raises:
         :class:`PatroniCtlException`: if *path* does not exist or is not readable.
     """
-    from patroni.config import Config
-
     if not (os.path.exists(path) and os.access(path, os.R_OK)):
         if path != CONFIG_FILE_PATH:    # bail if non-default config location specified but file not found / readable
             raise PatroniCtlException('Provided config file {0} not existing or no read rights.'
@@ -1013,7 +1012,6 @@ def reload(obj: Dict[str, Any], cluster_name: str, member_names: List[str],
         if r.status == 200:
             click.echo('No changes to apply on member {0}'.format(member.name))
         elif r.status == 202:
-            from patroni.config import get_global_config
             config = get_global_config(cluster)
             click.echo('Reload request received for member {0} and will be processed within {1} seconds'.format(
                 member.name, config.get('loop_wait') or dcs.loop_wait)
@@ -1095,7 +1093,6 @@ def restart(obj: Dict[str, Any], cluster_name: str, group: Optional[int], member
         content['postgres_version'] = version
 
     if scheduled_at:
-        from patroni.config import get_global_config
         if get_global_config(cluster).is_paused:
             raise PatroniCtlException("Can't schedule restart in the paused state")
         content['schedule'] = scheduled_at.isoformat()
@@ -1223,19 +1220,22 @@ def _do_failover_or_switchover(obj: Dict[str, Any], action: str, cluster_name: s
             dcs = get_dcs(obj, cluster_name, group)
             cluster = dcs.get_cluster()
 
-    if action == 'switchover' and (cluster.leader is None or not cluster.leader.name):
-        raise PatroniCtlException('This cluster has no leader')
+    global_config = get_global_config(cluster)
 
-    if leader is None:
-        if force or action == 'failover':
-            leader = cluster.leader and cluster.leader.name
-        else:
-            from patroni.config import get_global_config
-            prompt = 'Standby Leader' if get_global_config(cluster).is_standby_cluster else 'Primary'
-            leader = click.prompt(prompt, type=str, default=(cluster.leader and cluster.leader.member.name))
+    # leader has to be be defined for switchover only
+    if action == 'switchover':
+        if cluster.leader is None or not cluster.leader.name:
+            raise PatroniCtlException('This cluster has no leader')
 
-    if leader is not None and cluster.leader and cluster.leader.member.name != leader:
-        raise PatroniCtlException('Member {0} is not the leader of cluster {1}'.format(leader, cluster_name))
+        if leader is None:
+            if force:
+                leader = cluster.leader.name
+            else:
+                prompt = 'Standby Leader' if global_config.is_standby_cluster else 'Primary'
+                leader = click.prompt(prompt, type=str, default=(cluster.leader and cluster.leader.name))
+
+        if cluster.leader.name != leader:
+            raise PatroniCtlException(f'Member {leader} is not the leader of cluster {cluster_name}')
 
     # excluding members with nofailover tag
     candidate_names = [str(m.name) for m in cluster.members if m.name != leader and not m.nofailover]
@@ -1255,7 +1255,16 @@ def _do_failover_or_switchover(obj: Dict[str, Any], action: str, cluster_name: s
         raise PatroniCtlException(action.title() + ' target and source are the same.')
 
     if candidate and candidate not in candidate_names:
-        raise PatroniCtlException('Member {0} does not exist in cluster {1}'.format(candidate, cluster_name))
+        raise PatroniCtlException(
+            f'Member {candidate} does not exist in cluster {cluster_name} or is tagged as nofailover')
+
+    if all((not force,
+            action == 'failover',
+            global_config.is_synchronous_mode,
+            not cluster.sync.is_empty,
+            not cluster.sync.matches(candidate, True))):
+        if click.confirm(f'Are you sure you want to failover to the asynchronous node {candidate}'):
+            raise PatroniCtlException('Aborting ' + action)
 
     scheduled_at_str = None
     scheduled_at = None
@@ -1268,25 +1277,29 @@ def _do_failover_or_switchover(obj: Dict[str, Any], action: str, cluster_name: s
 
         scheduled_at = parse_scheduled(scheduled)
         if scheduled_at:
-            from patroni.config import get_global_config
-            if get_global_config(cluster).is_paused:
+            if global_config.is_paused:
                 raise PatroniCtlException("Can't schedule switchover in the paused state")
             scheduled_at_str = scheduled_at.isoformat()
 
-    failover_value = {'leader': leader, 'candidate': candidate, 'scheduled_at': scheduled_at_str}
+    failover_value = {'candidate': candidate}
+    if action == 'switchover':
+        failover_value['leader'] = leader
+    if scheduled_at_str:
+        failover_value['scheduled_at'] = scheduled_at_str
 
     logging.debug(failover_value)
 
     # By now we have established that the leader exists and the candidate exists
     if not force:
-        demote_msg = ', demoting current leader ' + leader if leader else ''
+        demote_msg = f', demoting current leader {cluster.leader.name}' if cluster.leader else ''
         if scheduled_at_str:
-            if not click.confirm('Are you sure you want to schedule {0} of cluster {1} at {2}{3}?'
-                                 .format(action, cluster_name, scheduled_at_str, demote_msg)):
+            # only switchover can be scheduled
+            if not click.confirm(f'Are you sure you want to schedule switchover of cluster '
+                                 f'{cluster_name} at {scheduled_at_str}{demote_msg}?'):
+                # action as a var to catch a regression in the tests
                 raise PatroniCtlException('Aborting scheduled ' + action)
         else:
-            if not click.confirm('Are you sure you want to {0} cluster {1}{2}?'
-                                 .format(action, cluster_name, demote_msg)):
+            if not click.confirm(f'Are you sure you want to {action} cluster {cluster_name}{demote_msg}?'):
                 raise PatroniCtlException('Aborting ' + action)
 
     r = None
@@ -1332,6 +1345,8 @@ def failover(obj: Dict[str, Any], cluster_name: str, group: Optional[int],
 
     .. note::
         If *leader* is given perform a switchover instead of a failover.
+        This behavior is deprecated. ``--leader`` option support will be
+        removed in the next major release.
 
     .. seealso::
         Refer to :func:`_do_failover_or_switchover` for details.
@@ -1345,7 +1360,12 @@ def failover(obj: Dict[str, Any], cluster_name: str, group: Optional[int],
     :param candidate: name of a standby member to be promoted. Nodes that are tagged with ``nofailover`` cannot be used.
     :param force: perform the failover or switchover without asking for confirmations.
     """
-    action = 'switchover' if leader else 'failover'
+    action = 'failover'
+    if leader:
+        action = 'switchover'
+        click.echo(click.style(
+            'Supplying a leader name using this command is deprecated and will be removed in a future version of'
+            ' Patroni, change your scripts to use `switchover` instead.\nExecuting switchover!', fg='red'))
     _do_failover_or_switchover(obj, action, cluster_name, group, leader, candidate, force)
 
 
@@ -1718,7 +1738,6 @@ def wait_until_pause_is_applied(dcs: AbstractDCS, paused: bool, old_cluster: Clu
     :param old_cluster: original cluster information before pause or unpause has been requested. Used to report which
         nodes are still pending to have ``pause`` equal *paused* at a given point in time.
     """
-    from patroni.config import get_global_config
     config = get_global_config(old_cluster)
 
     click.echo("'{0}' request sent, waiting until it is recognized by all nodes".format(paused and 'pause' or 'resume'))
@@ -1756,7 +1775,6 @@ def toggle_pause(config: Dict[str, Any], cluster_name: str, group: Optional[int]
             * ``pause`` state is already *paused*; or
             * cluster contains no accessible members.
     """
-    from patroni.config import get_global_config
     dcs = get_dcs(config, cluster_name, group)
     cluster = dcs.get_cluster()
     if get_global_config(cluster).is_paused == paused: