Update the multi-ha docs (#3510)

* draft
timescale · Oct 18, 2024 · a6cb09e · a6cb09e
1 parent 89a8d30
commit a6cb09e
Showing 1 changed file with 31 additions and 24 deletions.
diff --git a/use-timescale/ha-replicas/high-availability.md b/use-timescale/ha-replicas/high-availability.md
@@ -13,52 +13,59 @@ cloud_ui:
 # Manage high availability
 
 For Timescale Cloud Service with very low tolerance for downtime, Timescale Cloud offers 
-High Availability(HA) replicas. HA replicas significantly reduce the risk of downtime and data loss due to 
+High Availability (HA) replicas. HA replicas significantly reduce the risk of downtime and data loss due to 
 system failure, and enable services to avoid downtime during routine maintenance.
 
 This page shows you how to choose the best high availability option for your Timescale Cloud Service.  
 
 ## What is HA replication?
 
-HA replicas are exact, up-to-date copies of your database that automatically take over operations as your primary data 
-node if the original primary data node becomes unavailable. HA replicas are synchronous or asynchronous hot standbys 
-hosted in multiple AWS availability zones(AZ) that use streaming replication to minimize the chance of data loss during 
-failover. That is, the primary node streams its write-ahead log (WAL) to the replicas.
+HA replicas are exact, up-to-date copies of your database hosted in multiple AWS availability zones (AZ) within the same region as your primary node. They automatically take over operations if the original primary data node becomes unavailable. The primary node streams its write-ahead log (WAL) to the replicas to minimize the chances of data loss during failover. 
+
+HA replicas can be synchronous and asynchronous. 
+
+- Synchronous: the primary commits its next write once the replica confirms that the previous write is complete. There is no lag between the primary and the replica. They are in the same state at all times. This is preferable if you need the highest level of data integrity. However, this affects the primary ingestion time.
+
+- Asynchronous: the primary commits its next write without the confirmation of the previous write completion. The asynchronous HA replicas often have a lag, in both time and data, compared to the primary. This is preferable if you need the shortest primary ingest time.
+
+![Sync and async replication](https://assets.timescale.com/docs/images/sync_async_replication_draft.png)
 
 HA replicas have separate unique addresses that you can use to serve read-only requests in parallel to your 
-primary data node. When your primary data node fails, Timescale Cloud automatically _fails over_ to 
-a HA replica. During failover, the read-only address is unavailable while Timescale Cloud automatically create 
-a new HA replica. The time to make this replica depends on several factors, including the size of your data.
-You
+primary data node. When your primary data node fails, Timescale Cloud automatically fails over to 
+an HA replica within 30 seconds. During failover, the read-only address is unavailable while Timescale Cloud automatically creates a new HA replica. The time to make this replica depends on several factors, including the size of your data.
 
 Operations such as upgrading your Timescale Cloud Service to a new major or minor version may necessitate 
 a service restart. Restarts are run during the [maintenance window][upgrade]. To avoid any downtime, each data
 node is updated in turn. That is, while the primary data node is updated, a replica is promoted to primary. 
 After the primary is updated and online, the same maintenance is performed on the HA replicas.
 
-To ensure that all Timescale Cloud Services have minimum downtown and data loss in the most common
+To ensure that all Timescale Cloud Services have minimum downtime and data loss in the most common
 failure scenarios and during maintenance, [rapid recovery][rapid-recovery] is enabled by default for all services.
 
 ## Choose an HA strategy
 
 The following HA configurations are available in Timescale Cloud:
 
 - **Non-production**: no replica, best for developer environments.
-- **High Availability**: a single replica in a separate AWS availability zone. The High availability optimized mode is
-    good for both price sensitive customers and those who care most about failover speed and performance.
-    Async replication provides faster  write speeds and improved performance for apps with less stringent
-    consistency requirements.
-
-- **Highest Availability**: two readable replicas in separate AWS availability zones. Available replication modes are:
-  - *Optimized* - two asynchronous replicas: transactions are considered complete without waiting for the replicas to 
-    confirm. Async replication provides faster write speeds and improved performance for apps with less stringent
-    consistency requirements. When you access a HA read endpoint Timescale Cloud load balances across the replicas.
-  - *High data integrity* - one synchronous replica and one asynchronous replica: A synchronous replica is guaranteed to 
-    always be in the exact same state as the primary, minimizing failover time and ensuring no data loss. However, 
-    synchronous replicas reduce ingest performance and do not provide a replica endpoint 
-
-    Synchronous replication ensures the highest level of data consistency and safety. 
 
+- **High availability**: a single async replica in a different AWS availability zone from your primary. Provides high availability with cost efficiency. Best for production apps. 
+
+- **Highest availability**: two replicas in different AWS availability zones from your primary. Available replication modes are:
+
+  - **High performance** - two async replicas. Provides the highest level of availability with two AZs and the ability to query the HA system. Best for absolutely critical apps.
+  - **High data integrity** - one sync replica and one async replica. The sync replica is identical to the primary at all times. Best for apps that can tolerate no data loss.
+
+The following table summarizes the differences between these HA configurations:
+
+|| High availability <br/> (1 async) | High performance <br/> (2 async) | High data integrity <br/> (1 sync + 1 async) | 
+|-------|----------|------------|-----|
+|Write flow |The primary streams its WAL to the async replica, which may have a slight lag compared to the primary, providing 99.9% uptime SLA. |The primary streams its writes to both async replicas, providing 99.9+% uptime SLA.|The primary streams its writes to the sync and async replicas. The async replica is never ahead of the sync one.|
+|Additional read replica|Recommended. Reads from the HA replica may cause availability and lag issues. |Not needed. You can still read from the HA replica even if one of them is down. Configure an additional read replica only if your read use case is significantly different from your write use case.|Highly recommended. If you run heavy queries on a sync replica, it may fall behind the primary. Specifically, if it takes too long for the replica to confirm a transaction, the next transaction is canceled.|
+|Choosing the replica to read from manually| Not applicable. |Not available. Queries are load-balanced against all available HA replicas. |Not available. Queries are load-balanced against all available HA replicas.|
+| Sync replication | Only async replicas are supported in this configuration. |Only async replicas are supported in this configuration. | Supported.|
+| Failover flow | <ul><li>If the primary fails, the replica becomes the primary while a new node is created, with only seconds of downtime.</li><li>If the replica fails, a new async replica is created without impacting the primary. If you read from the async HA replica, those reads fail until the new replica is available.</li></ul> |<ul><li>If the primary fails, one of the replicas becomes the primary while a new node is created, with the other one still available for reads.</li><li>If the replica fails, a new async replica is created in another AZ, without impacting the primary. The newly created replica is behind the primary and the original replica while it catches up.</li></ul>|<ul><li>If the primary fails, the sync replica becomes the primary while a new node is created, with the async one still available for reads.</li><li>If the async replica fails, a new async replica is created. Heavy reads on the sync replica may delay the ingest time of the primary while a new async replica is created. Data integrity remains high but primary ingest performance may degrade.</li><li>If the sync replica fails, the async replica becomes the sync one, and a new async replica is created. The primary may experience some ingest performance degradation during this time.</li></ul>|
+| Cost composition | Primary + async (2x) |Primary + 2 async (3x)|Primary + 1 async + 1 sync (3x)|
+| Tier | Performance, Scale, and Enterprise  |Scale and Enterprise|Scale and Enterprise|
 
 The `High` and `Highest` HA strategies are available with the [Scale and the Enterprise][pricing-plans] pricing plans.