Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RemoteDirectory instance to Store as a secondary directory #3285

Conversation

sachinpkale
Copy link
Member

@sachinpkale sachinpkale commented May 11, 2022

Signed-off-by: Sachin Kale [email protected]

Description

Issues Resolved

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

this(shardId, indexSettings, directory, null, shardLock, OnClose.EMPTY);
}

public Store(ShardId shardId, IndexSettings indexSettings, Directory directory, Directory secondaryDirectory, ShardLock shardLock) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 extra constructor methods due to secondaryDirectory. Should we use a builder instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a builder instead?

Yes, and the builder should validate the arguments and throw a RuntimeException if the REMOTE_STORE feature flag is not set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest commit, I have removed extra parameter to Store. So, this is not required anymore.

@sachinpkale
Copy link
Member Author

Fixing spotless checks.

@sachinpkale sachinpkale force-pushed the feature/durability-enhancements branch from 399f1df to bcfd328 Compare May 11, 2022 11:58
@sachinpkale sachinpkale reopened this May 11, 2022
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 399f1dff56b54dc3010b29e46b523e29c478ed9d
Log 5230

Reports 5230

Directory remoteDirectory = null;
if (this.indexSettings.isRemoteStoreEnabled()) {
try {
Repository repository = repositoriesService.repository("dragon-stone");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repository name is hardcoded for now. It would be made configurable as a part of #3286

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success ce5f6cd
Log 5231

Reports 5231

@sachinpkale
Copy link
Member Author

Please refer this comment to understand why we went with having secondaryDirectory in the same Store instead of having a separate Store instance.

Comment on lines 215 to 220
if (secondaryDirectory != null) {
ByteSizeCachingDirectory sizeCachingSecondaryDir = new ByteSizeCachingDirectory(secondaryDirectory, refreshInterval);
this.secondaryDirectory = new StoreDirectory(sizeCachingSecondaryDir, Loggers.getLogger("index.store.deletes", shardId));
} else {
this.secondaryDirectory = null;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should we have a Directory factory instead instead of always having two directories always. Primary and secondary directory sounds confusing as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your view on having a separate store instance with RemoteDirectory?

Copy link
Collaborator

@reta reta May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The presence of branching logic with primary / secondary is confusing, please consider subclassing the Store (fe CachedStore) to have a clean separation between regular Store and Store backed by a cache / secondary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think more on the comments from @Bukhtawar and @eirsep, we may need to have a CompositeDirectory which will have 2 directories (we can create compositions for multiple directories if required). CompositeDirectory will be abstract and its implementations can define how to handle operations on the encapsulated directories. Implementation can be parallel or sequential.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going forward I do feel a need to have an abstraction at a Store level, espl with #2900 where we might extend search functionality directly on a remote store. Then we have to perform StoreRecovery operations when the shard data is lost, integrity, checksums which is what the store abstraction provides. How about a CompositeStore with each Store based out of their corresponding Directory implementation and IndexShard interfacing with this CompositeStore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think more, going either with CompositeStore or CompositeDirectory would be too early. I was able to get a quick implementation out with CompositeDirectory but it would require most of the RemoteDirectory methods to be implemented before we can actually start using it. Using these abstractions will not avoid having a reconciliation mechanism between local and remote directory. I made some changes in the latest commit and Store is not changed in the latest change. Please take a look.

remoteDirectory = remoteDirectoryFactory.newDirectory(this.indexSettings, path, repository);
} catch (RepositoryMissingException e) {
throw new IllegalArgumentException(
"Repository should be created before creating index with remote_store enabled setting",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is repository created by user or by system?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, it is created by user. Once we make the name configurable as a part of #3286, we can check if the repository can be created during bootstrap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you're saying as of now it will have to be created by user but he can only name it as "dragon-stone"? That doesn't seem correct right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard-coded in this commit. As part of Remote Store V1, we are tracking meta issue: #2992. One of the tasks in the meta takes care of making repository name configurable. #3286

@@ -174,6 +174,7 @@ public class Store extends AbstractIndexShardComponent implements Closeable, Ref

private final AtomicBoolean isClosed = new AtomicBoolean(false);
private final StoreDirectory directory;
private final StoreDirectory secondaryDirectory;
Copy link
Member

@eirsep eirsep May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can store not have to worry about how many directories are there and just have one StoreDirectory which is extended by a Remote+Local store Directory wrapper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is secondaryDirectory just a cache? It looks like it, so why not to call it as such?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the latest changes. I have removed reference to secondaryDirectory from the Store.

Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to incorporate the REMOTE_STORE feature flag and throw a RuntimeException if the system is trying to bootstrap the remote directories with the feature flag disabled. Bypassing the feature flag here is trappy.

this(shardId, indexSettings, directory, null, shardLock, OnClose.EMPTY);
}

public Store(ShardId shardId, IndexSettings indexSettings, Directory directory, Directory secondaryDirectory, ShardLock shardLock) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a builder instead?

Yes, and the builder should validate the arguments and throw a RuntimeException if the REMOTE_STORE feature flag is not set.

import java.util.Collections;

import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.*;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No wildcard imports

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with individual imports. I will check if we can add this check to spotless

@sachinpkale sachinpkale force-pushed the feature/durability-enhancements branch from ce5f6cd to d13a660 Compare May 13, 2022 19:55
@sachinpkale
Copy link
Member Author

This needs to incorporate the REMOTE_STORE feature flag and throw a RuntimeException if the system is trying to bootstrap the remote directories with the feature flag disabled. Bypassing the feature flag here is trappy.

Yes, while creating instance of RemoteDirectory, I am checking if remote_store setting is enabled or not. As the setting itself is not active if feature flag is not set, we don't need explicit feature flag check, right?

@sachinpkale sachinpkale force-pushed the feature/durability-enhancements branch from d13a660 to 207941d Compare May 13, 2022 20:11

@Override
public void afterRefresh(boolean didRefresh) throws IOException {
// ToDo Add implementation
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual implementation will be added in the next commit

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success d13a660cf4f666d36f222c3b54f2b2f00f5b953b
Log 5313

Reports 5313

@sachinpkale sachinpkale requested review from Bukhtawar and nknize May 13, 2022 20:28
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 207941d
Log 5317

Reports 5317

@sachinpkale
Copy link
Member Author

yamlRestTest failed which is a flaky test.

@Bukhtawar
Copy link
Collaborator

start gradle check

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 207941d
Log 5353

Reports 5353

Comment on lines +22 to +32
public class RemoteDirectoryFactory implements IndexStorePlugin.RemoteDirectoryFactory {

@Override
public Directory newDirectory(IndexSettings indexSettings, ShardPath path, Repository repository) throws IOException {
assert repository instanceof BlobStoreRepository : "repository should be instance of BlobStoreRepository";
BlobPath blobPath = new BlobPath();
blobPath = blobPath.add(indexSettings.getIndex().getName()).add(String.valueOf(path.getShardId().getId()));
BlobContainer blobContainer = ((BlobStoreRepository) repository).blobStore().blobContainer(blobPath);
return new RemoteDirectory(blobContainer);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the factory supposed to vend out another variation based on the inputs? If not do we need a factory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there will be a directory instance per shard as path will be different per directory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if that qualifies to be a factory f.e look at FsDirectoryFactory which creates different a HybridDirectory or NIOFSDirectory based on certain attributes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. You mean, if it is only one variant each time, we do not need a factory, right?
I don't have very strong inclination towards having a factory. It is mostly to keep things similar to what we have with FS based factories. It hides the complexity of understanding all the metadata and creating BlobContainer. Without factory, the code would be in the constructor of RemoteDirectory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call it a Builder instead?

Comment on lines +3107 to +3112
final List<ReferenceManager.RefreshListener> internalRefreshListener;
if (remoteStoreRefreshListener != null && shardRouting.primary()) {
internalRefreshListener = Arrays.asList(new RefreshMetricUpdater(refreshMetric), remoteStoreRefreshListener);
} else {
internalRefreshListener = Collections.singletonList(new RefreshMetricUpdater(refreshMetric));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets push this down to IndexShard and pass a NoopRemoteStoreRefreshListener as needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not get this. You mean push it down to EngineConfig?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I meant IndexServices that create IndexShard

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this to avoid null check? As RemoteStoreRefreshListener is injected into IndexShard, tests can have the NoopRemoteStoreRefreshListener if does not want to actually upload the data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with what we have now, but if the checks grow at multiple place we might want to give this a revisit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is identical to the listener for segrep - here. What about building a list of additional listeners in IndexService and pass directly to IndexShard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bukhtawar Agree, adding a task for the same.
@mch2 Yes, once we merge with SegRep code, we can abstract it out.

@Bukhtawar Bukhtawar merged commit 4408e6b into opensearch-project:feature/durability-enhancements May 16, 2022
sachinpkale added a commit to sachinpkale/OpenSearch that referenced this pull request May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants