Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform unreferenced file cleanup for any operation failure and mute flaky test #12128

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Update supported version for max_shard_size parameter in Shrink API ([#11439](https://github.com/opensearch-project/OpenSearch/pull/11439))
- Fix typo in API annotation check message ([11836](https://github.com/opensearch-project/OpenSearch/pull/11836))
- Update supported version for must_exist parameter in update aliases API ([#11872](https://github.com/opensearch-project/OpenSearch/pull/11872))
- Fix unreferenced file cleanup on any operations failure ([#12054](https://github.com/opensearch-project/OpenSearch/issues/12054))
- [Bug] Check phase name before SearchRequestOperationsListener onPhaseStart ([#12035](https://github.com/opensearch-project/OpenSearch/pull/12035))

### Security
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,7 @@ public void testIndexCreateBlockNotAppliedWhenAnyNodesBelowHighWatermark() throw
assertFalse(state.blocks().hasGlobalBlockWithId(Metadata.CLUSTER_CREATE_INDEX_BLOCK.id()));
}

@AwaitsFix(bugUrl = "https://github.com/opensearch-project/OpenSearch/issues/6445")
public void testIndexCreateBlockIsRemovedWhenAnyNodesNotExceedHighWatermarkWithAutoReleaseEnabled() throws Exception {
final Settings settings = Settings.builder()
.put(DiskThresholdSettings.CLUSTER_ROUTING_ALLOCATION_DISK_THRESHOLD_ENABLED_SETTING.getKey(), false)
Expand Down
17 changes: 8 additions & 9 deletions server/src/main/java/org/opensearch/index/engine/Engine.java
Original file line number Diff line number Diff line change
Expand Up @@ -1317,11 +1317,11 @@ public void failEngine(String reason, @Nullable Exception failure) {
}
}

// If cleanup of unreferenced flag is enabled and force merge or regular merge failed due to IOException,
// clean all unreferenced files on best effort basis created during failed merge and reset the
// shard state back to last Lucene Commit.
if (shouldCleanupUnreferencedFiles() && isMergeFailureDueToIOException(failure, reason)) {
logger.info("Cleaning up unreferenced files as merge failed due to: {}", reason);
// If cleanup of unreferenced file flag is enabled and any operation failed due to IOException,
// clean up all unreferenced files on best effort basis created during failed merge and reset the
// shard state back to last Lucene Commit
if (shouldCleanupUnreferencedFiles() && isOperationFailureDueToIOException(failure)) {
logger.info("Cleaning up unreferenced files created during failed merge due to: {}", reason);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be warning?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would be doing extra clean ups on may be other shards as well due to this . Please evaluate if we can avoid the same and do clean up only on the responsible shard instead

Comment on lines +1323 to +1324
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to log if the exception isn't IOE

cleanUpUnreferencedFiles();
}

Expand Down Expand Up @@ -1365,10 +1365,9 @@ private void cleanUpUnreferencedFiles() {
}
}

/** Check whether the merge failure happened due to IOException. */
private boolean isMergeFailureDueToIOException(Exception failure, String reason) {
return (reason.equals(FORCE_MERGE) || reason.equals(MERGE_FAILED))
&& ExceptionsHelper.unwrap(failure, IOException.class) instanceof IOException;
/** Check whether an operation failed due to IOException. */
private boolean isOperationFailureDueToIOException(Exception failure) {
return ExceptionsHelper.unwrap(failure, IOException.class) instanceof IOException;
}

/** Check whether the engine should be failed */
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,7 @@ public void testLoggingOnHungIO() throws Exception {
}
}

@AwaitsFix(bugUrl = "https://github.com/opensearch-project/OpenSearch/issues/7557")
public void testFailsHealthOnHungIOBeyondHealthyTimeout() throws Exception {
long healthyTimeoutThreshold = randomLongBetween(500, 1000);
long refreshInterval = randomLongBetween(500, 1000);
Expand Down
Loading