Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HotToWarmTieringService changes to tier shards #14891

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

neetikasinghal
Copy link
Contributor

@neetikasinghal neetikasinghal commented Jul 23, 2024

Description

Related Issues

#14545
#13980

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 2e9f80d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 052d551: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 0813dac: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@neetikasinghal neetikasinghal added backport 2.x Backport to 2.x branch v2.16.0 Issues and PRs related to version 2.16.0 release labels Jul 23, 2024
Copy link
Contributor

❌ Gradle check result for 41f986c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for 8975e97: SUCCESS

@neetikasinghal neetikasinghal force-pushed the tiering-service branch 2 times, most recently from d7bffe7 to 4774394 Compare July 26, 2024 23:39
Copy link
Contributor

✅ Gradle check result for d7bffe7: SUCCESS

Copy link
Contributor

❌ Gradle check result for 4774394: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Collaborator

@jed326 jed326 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a changelog entry but otherwise LGTM

@ExperimentalApi
public class TieringRequestContext {
private final ActionListener<HotToWarmTieringResponse> actionListener;
private final Map<Index, IndexTieringInfo> indexTieringStatusMap;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about just keeping 2 set for accepted and completed indices and 1 map for failedIndices. That way you can keep the indices in respective data structures and don't have to do filtering every time for indices in specific state.

Copy link
Contributor Author

@neetikasinghal neetikasinghal Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about the above as well. However, the current approach is cleaner in that the entries need not be moved from accepted set to completed/failed indices, the entry is only in one of the states of tiering (keeping only one source of truth).
Also, later if we plan to extend the tiering states, TieringRequestContext can be easily extensible for different states of tiering. In any case if there is another transition state introduced for another type of tiering, we would need to introduce another set whereas in current way, we just need to add a state to IndexTieringState.
I would like to keep it as is unless you have a strong opinion here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the current approach is cleaner in that the entries need not be moved from accepted set to completed/failed indices, the entry is only in one of the states of tiering (keeping only one source of truth)

I don't see issue with moving from one set to another. Each data structure is providing an easy way to get the indices in that state which is what most of the calls from H2WTieringService is and hence the suggestion. We still have single source of truth which is TieringRequestContext object that encapsulates these different data structures to maintain indices in different states instead of a single map.

Also, later if we plan to extend the tiering states, TieringRequestContext can be easily extensible for different states of tiering. In any case if there is another transition state introduced for another type of tiering, we would need to introduce another set whereas in current way, we just need to add a state to IndexTieringState

Agree on this but don't see any other tiering state at the moment. Also TieringRequestContext is tied to HotToWarmMigration so if we have to reuse or introduce any new state for a different tiering type, then refactoring will be needed anyways. We can always think about the better mechanism when the use case with other tiering types are known.

Copy link
Contributor Author

@neetikasinghal neetikasinghal Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the current approach is cleaner in that the entries need not be moved from accepted set to completed/failed indices, the entry is only in one of the states of tiering (keeping only one source of truth)

I don't see issue with moving from one set to another. Each data structure is providing an easy way to get the indices in that state which is what most of the calls from H2WTieringService is and hence the suggestion. We still have single source of truth which is TieringRequestContext object that encapsulates these different data structures to maintain indices in different states instead of a single map.

I agree that for a given request we have a single source of truth which is TieringRequestContext. However, to figure out the state of the index (accepted/completed/failed), we would have different sources of truth.
Given that we would have a limited number of indices that would undergo tiering at a given time, I see that the filtering operation would be a constant time operation. What is the other concern that you see with the current implementation?
Also with sets approach - we would need 3 sets and one map here - accepted, successful, completed, failed as compared to what is maintained as a single map in the current implementation.

Also, later if we plan to extend the tiering states, TieringRequestContext can be easily extensible for different states of tiering. In any case if there is another transition state introduced for another type of tiering, we would need to introduce another set whereas in current way, we just need to add a state to IndexTieringState

Agree on this but don't see any other tiering state at the moment. Also TieringRequestContext is tied to HotToWarmMigration so if we have to reuse or introduce any new state for a different tiering type, then refactoring will be needed anyways. We can always think about the better mechanism when the use case with other tiering types are known.

makes sense.

Copy link
Contributor

github-actions bot commented Aug 7, 2024

✅ Gradle check result for 7da1aa6: SUCCESS

Copy link

codecov bot commented Aug 7, 2024

Codecov Report

Attention: Patch coverage is 27.80488% with 148 lines in your changes missing coverage. Please review.

Project coverage is 71.86%. Comparing base (97c1bf0) to head (d99f55f).
Report is 333 commits behind head on main.

Files with missing lines Patch % Lines
...earch/indices/tiering/HotToWarmTieringService.java 20.58% 106 Missing and 2 partials ⚠️
...n/admin/indices/tiering/TieringRequestContext.java 0.00% 24 Missing ⚠️
...dices/tiering/TransportHotToWarmTieringAction.java 46.15% 7 Missing ⚠️
...ices/tiering/TieringUpdateClusterStateRequest.java 0.00% 6 Missing ⚠️
...rch/action/admin/indices/tiering/TieringUtils.java 83.33% 1 Missing ⚠️
...org/opensearch/cluster/metadata/IndexMetadata.java 66.66% 1 Missing ⚠️
...earch/indices/tiering/TieringRequestValidator.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #14891      +/-   ##
============================================
+ Coverage     71.74%   71.86%   +0.12%     
- Complexity    62904    62963      +59     
============================================
  Files          5178     5182       +4     
  Lines        295167   295359     +192     
  Branches      42679    42701      +22     
============================================
+ Hits         211774   212268     +494     
+ Misses        66011    65663     -348     
- Partials      17382    17428      +46     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@sohami sohami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also look into increasing the code coverage, seems pretty low right now. Lets aim to keep it above 80%.

final TieringUpdateClusterStateRequest updateClusterStateRequest = new TieringUpdateClusterStateRequest(
tieringValidationResult.getRejectedIndices(),
request.waitForCompletion()
).ackTimeout(request.timeout())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this ackTimeout used ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 122 to 126
}

public void markTiered() {
this.state = IndexTieringState.TIERED;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a state machine, do we need to validate the transitions are happening in the correct sequence?

Copy link
Contributor

github-actions bot commented Aug 8, 2024

✅ Gradle check result for d99f55f: SUCCESS

void processTieringRequestContexts(final ClusterState clusterState) {
final Map<Index, TieringRequestContext> tieredIndices = new HashMap<>();
for (TieringRequestContext tieringRequestContext : tieringRequestContexts) {
if (tieringRequestContext.isRequestProcessingComplete()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check handles the cases where as indices are failed as part of the request

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Sep 9, 2024
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Oct 13, 2024
@dbwiddis
Copy link
Member

@neetikasinghal Are you still working on this? Looks like we need a few merge conflicts addressed. Otherwise is it ready for review?

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport 2.16 release v2.16.0 Issues and PRs related to version 2.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants