Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cross-partition scan options #1294

Merged
merged 6 commits into from
Nov 24, 2023
Merged

Conversation

jnmt
Copy link
Contributor

@jnmt jnmt commented Nov 16, 2023

Description

This PR introduces cross-partition scan options and the related errors and warnings so that developers can know ScanAll's behavior is different from the regular Scan and make sure to use it explicitly (at their own risk for NoSQL databases).

Related issues and/or PRs

Changes made

  • Add three options in DatabaseConfig
    • cross-partition scan: ScanAll without filtering and ordering
    • cross-partition scan with filtering: ScanAll with conjunctions (where clause)
    • cross-partition scan with ordering: ScanAll with ordering
    • To enable the last two, you need to enable the first one
    • Why these three? This granularity will help users understand the support features in their databases, especially when we implement filtering for the NoSQL cross-partition scan.
  • Add warnings when you enable cross-partition scan with serializable isolation
  • Change the following for each storage
    • Add config check for NoSQL databases if cross-partition scan filtering and ordering are both disabled
    • Add scan operation check if cross-partition scan options are appropriately enabled (in OperationChecker)
    • Make the storage name public in the config class to refer to it in a single place because some storage names are required for checking the config and warnings

Checklist

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes. (Will be done in another PR)
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

  • We still have a lot of the word RelationalScan in tests, but keep them as is for now
  • See inline comments for other notes

Release notes

Added cross-partition scan options

Copy link
Contributor Author

@jnmt jnmt Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need this test class because it only tests if the scan is relational or not, and now we test it in OperationChecker instead. Ditto for other storage.

}
} else {
return buildSelectQueryForScan(scan, tableMetadata);
return buildSelectQuery((ScanAll) scan, tableMetadata);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After introducing the cross-partition scan concept, I was convinced that we don't have to distinguish ScanAll and RelationalScan anymore. This is a kind of refactoring, but I included this since it is related to re-organizing relational scan terminology.

@@ -90,6 +98,18 @@ protected void load() {
activeTransactionManagementExpirationTimeMillis =
getLong(getProperties(), ACTIVE_TRANSACTION_MANAGEMENT_EXPIRATION_TIME_MILLIS, -1);
defaultNamespaceName = getString(getProperties(), DEFAULT_NAMESPACE_NAME, null);
crossPartitionScanEnabled = getBoolean(getProperties(), CROSS_PARTITION_SCAN, false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value is false in v4.0.0 or later, but it should be true in v3.x for backward compatibility and warned when using it with SERIALIZABLE. I will handle this after this PR is merged to the master.


private void warnCrossPartitionScan(String storageName) {
logger.warn(
"Enabling cross-partition scan for {} in production is not recommended "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[trivial] I imagined what the message would be like (e.g., "Enabling cross-partition scan for cosmos in production ..."), and I think adding quotations might be a bit better.

Suggested change
"Enabling cross-partition scan for {} in production is not recommended "
"Enabling cross-partition scan for '{}' in production is not recommended "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7bffcbb. Thank you!

if (config.isCrossPartitionScanFilteringEnabled()
|| config.isCrossPartitionScanOrderingEnabled()) {
throw new IllegalArgumentException(
"Cross-partition scan with filtering or ordering is not supported in Cassandra");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is related to https://github.com/scalar-labs/scalardb/pull/1294/files#diff-660ca76c9b64d94000e0854e2a18fdc5e6a69c028a74b8cacfa4ef733d4604c2L111, but sorry, I forgot why these types of scan were prohibitted. Could you tell me that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the question. Basically, it's hard to push down all kinds of filtering and ordering to NoSQL databases. But for filtering, we can support ScalarDB-side filtering in the near future since we already have a logic for evaluating all types of our predicates.

[skip ci]
@jnmt jnmt requested a review from komamitsu November 20, 2023 02:00
Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a minor comment, but LGTM! Thank you!

@@ -81,12 +83,14 @@ public class JdbcConfig {
public JdbcConfig(DatabaseConfig databaseConfig) {
String storage = databaseConfig.getStorage();
String transactionManager = databaseConfig.getTransactionManager();
if (!"jdbc".equals(storage) && !"jdbc".equals(transactionManager)) {
if (!storage.equals(STORAGE_NAME) && !transactionManager.equals(STORAGE_NAME)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, when we set the transactionManager config to jdbc, the storage config is not required. So the storage config can be null. So we need to use the STORAGE_NAME.equals(...) style to avoid NullPointerExceptiom:

Suggested change
if (!storage.equals(STORAGE_NAME) && !transactionManager.equals(STORAGE_NAME)) {
if (!STORAGE_NAME.equals(storage) && !STORAGE_NAME.equals(transactionManager)) {

Or we also use Objects.equals() as well. Either is fine for me. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brfrn169 Thank you for pointing it out. I think cassandra is always set to the storage by default when we create DatabaseConfig, it will never be null. If I understand it correctly, is the assertion better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you are right. I don't remember why we used that reverse style ("jdbc".equals(storage)) originally. It looks like we have default values for storage and transactionManager, so we don't need to care about NullPointerException. So let's keep it as it is. Thanks.

Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link
Contributor

@kota2and3kan kota2and3kan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!
(Sorry for my late review... 🙇‍♀️)

@jnmt
Copy link
Contributor Author

jnmt commented Nov 24, 2023

@Torch3333 I will merge this, but PTAL when you are back.

@jnmt jnmt merged commit 12eafa2 into master Nov 24, 2023
23 checks passed
@jnmt jnmt deleted the add-cross-partition-scan-option branch November 24, 2023 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants