Backport to branch(3.11) : Update requirements and recommendations (#…

…1594) Co-authored-by: Josh Wong <[email protected]>
scalar-labs · Mar 11, 2024 · a436db5 · a436db5
1 parent 1f6fcca
commit a436db5
Show file tree

Hide file tree

Showing 2 changed files with 116 additions and 24 deletions.
diff --git a/docs/requirements.md b/docs/requirements.md
@@ -1,51 +1,143 @@
 # Requirements and Recommendations for the Underlying Databases of ScalarDB
 
-This document explains the requirements and recommendations in the underlying databases of ScalarDB to make ScalarDB applications work correctly.
+This document explains the requirements and recommendations in the underlying databases of ScalarDB to make ScalarDB applications work correctly and efficiently.
 
-## Common requirements
+## Requirements
 
-This section describes common requirements for the underlying databases when using ScalarDB.
+ScalarDB requires each underlying database to provide certain capabilities to run transactions and analytics on the databases. This document explains the general requirements and how to configure each database to achieve the requirements.
 
-### Privileges to access the underlying databases
+### General requirements
 
-ScalarDB operates the underlying databases not only for CRUD operations but also for performing operations like creating or altering schemas, tables, or indexes. Thus, ScalarDB basically requires a fully privileged account to access the underlying databases.
+#### Transactions
+{:.no_toc}
+ScalarDB requires each underlying database to provide at least the following capabilities to run transactions on the databases:
 
-## Cassandra or Cassandra-compatible database requirements
+- Linearizable read and conditional mutations (write and delete) on a single database record.
+- Durability of written database records.
+- Ability to store arbitrary data besides application data in each database record.
 
-The following are requirements to make ScalarDB on Cassandra or Cassandra-compatible databases work properly and for storage operations with `LINEARIZABLE` to provide linearizability and for transaction operations with `SERIALIZABLE` to provide strict serializability.
+#### Analytics
+{:.no_toc}
+ScalarDB requires each underlying database to provide the following capability to run analytics on the databases:
 
-### Ensure durability in Cassandra
+- Ability to return only committed records.
 
-In **cassandra.yaml**, you must change `commitlog_sync` from the default `periodic` to `batch` or `group` to ensure durability in Cassandra.
+{% capture notice--info %}
+**Note**
+
+You need to have database accounts that have enough privileges to access the databases through ScalarDB since ScalarDB runs on the underlying databases not only for CRUD operations but also for performing operations like creating or altering schemas, tables, or indexes. ScalarDB basically requires a fully privileged account to access the underlying databases.
+{% endcapture %}
+
+<div class="notice--info">{{ notice--info | markdownify }}</div>
+
+### How to configure databases to achieve the general requirements
+
+Select your database for details on how to configure it to achieve the general requirements.
+
+<div id="tabset-1">
+<div class="tab">
+  <button class="tablinks" onclick="openTab(event, 'JDBC_databases', 'tabset-1')" id="defaultOpen-1">JDBC databases</button>
+  <button class="tablinks" onclick="openTab(event, 'DynamoDB', 'tabset-1')">DynamoDB</button>
+  <button class="tablinks" onclick="openTab(event, 'Cosmos_DB_for_NoSQL', 'tabset-1')">Cosmos DB for NoSQL</button>
+  <button class="tablinks" onclick="openTab(event, 'Cassandra', 'tabset-1')">Cassandra</button>
+</div>
+
+<div id="JDBC_databases" class="tabcontent" markdown="1">
+
+#### Transactions
+{:.no_toc}
+- Use a single primary server or synchronized multi-primary servers for all operations (no read operations on read replicas that are asynchronously replicated from a primary database).
+- Use read-committed or stricter isolation levels.
+
+#### Analytics
+{:.no_toc}
+- Use read-committed or stricter isolation levels.
 
-ScalarDB provides only the atomicity and isolation properties of ACID and requests the underlying databases to provide durability. Although you can specify `periodic`, we do not recommend doing so unless you know exactly what you are doing.
+</div>
 
-### Confirm that the Cassandra-compatible database supports lightweight transactions (LWTs)
+<div id="DynamoDB" class="tabcontent" markdown="1">
 
-You must use a Cassandra-compatible database that supports LWTs.
+#### Transactions
+{:.no_toc}
+- Use a single primary region for all operations. (No read and write operations on global tables in non-primary regions.)
+  - There is no concept for primary regions in DynamoDB, so you must designate a primary region by yourself.
 
-ScalarDB does not work on some Cassandra-compatible databases that do not support LWTs, such as [Amazon Keyspaces](https://aws.amazon.com/keyspaces/). This is because the Consensus Commit transaction manager relies on the linearizable operations of underlying databases to make transactions serializable.
+#### Analytics
+{:.no_toc}
+- Not applicable. DynamoDB always returns committed records, so there are no DynamoDB-specific requirements.
 
-## CosmosDB database requirements
+</div>
 
-In your Azure CosmosDB account, you must set the **default consistency level** to **Strong**.
+<div id="Cosmos_DB_for_NoSQL" class="tabcontent" markdown="1">
 
-Consensus Commit, the ScalarDB transaction protocol, requires linearizable reads. By setting the **default consistency level** to **Strong**, CosmosDB can guarantee linearizability.
+#### Transactions
+{:.no_toc}
+- Use a single primary region for all operations with `Strong` or `Bounded Staleness` consistency.
 
-For instructions on how to configure this setting, see the official documentation at [Configure the default consistency level](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-manage-consistency#configure-the-default-consistency-level).
+#### Analytics
+{:.no_toc}
+- Not applicable. Cosmos DB always returns committed records, so there are no Cosmos DB–specific requirements.
 
-## JDBC database recommendations
+</div>
 
-In ScalarDB on JDBC databases, you can't choose a consistency level (`LINEARIZABLE`, `SEQUENTIAL` or `EVENTUAL`) in your code by using the `Operation.withConsistency()` method. In addition, the consistency level depends on the setup of your JDBC database.
+<div id="Cassandra" class="tabcontent" markdown="1">
 
-For example, if you have asynchronous read replicas in your setup and perform read operations against them, the consistency will be eventual because you can read stale data from the read replicas. On the other hand, if you perform all operations against a single master instance, the consistency will be linearizable.
+#### Transactions
+{:.no_toc}
+- Use a single primary cluster for all operations (no read or write operations in non-primary clusters).
+- Use `batch` or `group` for `commitlog_sync`.
+- If you're using Cassandra-compatible databases, those databases must properly support lightweight transactions (LWT).
 
-With this in mind, you must perform all operations or transactions against a single master instance so that you can achieve linearizability and avoid worrying about consistency issues in your application. In other words, ScalarDB does not support read replicas. 
+#### Analytics
+{:.no_toc}
+- Not applicable. Cassandra always returns committed records, so there are no Cassandra-specific requirements.
+
+</div>
+</div>
+
+## Recommendations
+
+Properly configuring each underlying database of ScalarDB for high performance and high availability is recommended. The following recommendations include some knobs and configurations to update.
 
 {% capture notice--info %}
 **Note**
 
-You can still use a read replica as a backup and standby even when following this guideline.
+ScalarDB can be seen as an application of underlying databases, so you may want to try updating other knobs and configurations that are commonly used to improve efficiency.
 {% endcapture %}
+<div class="notice--info">{{ notice--info | markdownify }}</div>
+
+<div id="tabset-2">
+<div class="tab">
+  <button class="tablinks" onclick="openTab(event, 'JDBC_databases2', 'tabset-2')" id="defaultOpen-2">JDBC databases</button>
+  <button class="tablinks" onclick="openTab(event, 'DynamoDB2', 'tabset-2')">DynamoDB</button>
+  <button class="tablinks" onclick="openTab(event, 'Cosmos_DB_for_NoSQL2', 'tabset-2')">Cosmos DB for NoSQL</button>
+  <button class="tablinks" onclick="openTab(event, 'Cassandra2', 'tabset-2')">Cassandra</button>
+</div>
+
+<div id="JDBC_databases2" class="tabcontent" markdown="1">
+- Use read-committed isolation for better performance.
+- Follow the performance optimization best practices for each database. For example, increasing the buffer size (for example, `shared_buffers` in PostgreSQL) and increasing the number of connections (for example, `max_connections` in PostgreSQL) are usually recommended for better performance.
+</div>
 
+<div id="DynamoDB2" class="tabcontent" markdown="1">
+- Increase the number of read capacity units (RCUs) and write capacity units (WCUs) for high throughput.
+- Enable point-in-time recovery (PITR).
+
+{% capture notice--info %}
+**Note**
+
+Since DynamoDB stores data in multiple availability zones by default, you don’t need to adjust any configurations to improve availability.
+{% endcapture %}
 <div class="notice--info">{{ notice--info | markdownify }}</div>
+</div>
+
+<div id="Cosmos_DB_for_NoSQL2" class="tabcontent" markdown="1">
+- Increase the number of Request Units (RUs) for high throughput.
+- Enable point-in-time restore (PITR).
+- Enable availability zones.
+</div>
+
+<div id="Cassandra2" class="tabcontent" markdown="1">
+- Increase `concurrent_reads` and `concurrent_writes` for high throughput. For details, see the official Cassandra documentation about [`concurrent_writes`](https://cassandra.apache.org/doc/stable/cassandra/configuration/cass_yaml_file.html#concurrent_writes).
+</div>
+</div>
diff --git a/docs/scalardb-supported-databases.md b/docs/scalardb-supported-databases.md
@@ -18,7 +18,7 @@ ScalarDB supports the following databases and their versions.
 {% capture notice--info %}
 **Note**
 
-For requirements when using Cassandra or Cassandra-compatible databases, see [Cassandra or Cassandra-compatible database requirements](requirements.md#cassandra-or-cassandra-compatible-database-requirements).
+For requirements when using Cassandra or Cassandra-compatible databases, see [How to configure databases to achieve the general requirements](requirements.md#how-to-configure-databases-to-achieve-the-general-requirements).
 {% endcapture %}
 
 <div class="notice--info">{{ notice--info | markdownify }}</div>
@@ -48,7 +48,7 @@ For requirements when using Cassandra or Cassandra-compatible databases, see [Ca
 {% capture notice--info %}
 **Note**
 
-For recommendations when using JDBC databases, see [JDBC database recommendations](requirements.md#jdbc-database-recommendations).
+For recommendations when using JDBC databases, see [Recommendations](requirements.md#recommendations).
 {% endcapture %}
 
 <div class="notice--info">{{ notice--info | markdownify }}</div>