Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky integ tests #487

Closed
wants to merge 2 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
import org.opensearch.core.xcontent.XContentBuilder;
import org.opensearch.index.query.QueryBuilder;
import org.opensearch.knn.index.SpaceType;
import org.opensearch.ml.common.model.MLModelState;
import org.opensearch.neuralsearch.OpenSearchSecureRestTestCase;
import org.opensearch.neuralsearch.util.NeuralSearchClusterUtil;
import org.opensearch.test.ClusterServiceUtils;
Expand Down Expand Up @@ -610,8 +611,8 @@ protected void deleteModel(String modelId) {
ImmutableList.of(new BasicHeader(HttpHeaders.USER_AGENT, DEFAULT_USER_AGENT))
);

// after model undeploy returns, the max interval to update model status is 3s in ml-commons CronJob.
Thread.sleep(3000);
// wait for model undeploy to complete
pollForModelState(modelId, MLModelState.UNDEPLOYED, 3000, 5);

makeRequest(
client(),
Expand All @@ -623,6 +624,16 @@ protected void deleteModel(String modelId) {
);
}

@SneakyThrows
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove SneakyThrows from here and add the exception in method signature.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

protected void pollForModelState(String modelId, MLModelState expectedModelState, int intervalMs, int maxAttempts) {
for (int i = 0; i < maxAttempts; i++) {
Thread.sleep(intervalMs);
if (expectedModelState.equals(getModelState(modelId))) {
return;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To better improve the visibility around why tests are failing, I would say at the end of for loop if the model is not in the correct state we should fail the tests, proper messaging for the failure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

}

public boolean isUpdateClusterSettings() {
return true;
}
Expand Down Expand Up @@ -698,7 +709,7 @@ protected void deleteSearchPipeline(final String pipelineId) {
}

/**
* Find all modesl that are currently deployed in the cluster
* Find all models that are currently deployed in the cluster
* @return set of model ids
*/
@SneakyThrows
Expand Down Expand Up @@ -733,11 +744,33 @@ protected Set<String> findDeployedModels() {
List<Map<String, Object>> innerHitsMap = (List<Map<String, Object>>) hits.get("hits");
return innerHitsMap.stream()
.map(hit -> (Map<String, Object>) hit.get("_source"))
.filter(hitsMap -> !Objects.isNull(hitsMap) && hitsMap.containsKey("model_id"))
.filter(
hitsMap -> !Objects.isNull(hitsMap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Objects.notNull()

&& hitsMap.containsKey("model_id")
&& MLModelState.DEPLOYED.equals(getModelState(hitsMap.get("model_id").toString()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MLModelState is an ENUM, so you can use == to compare 2 enums which will ensure that compile time check fields which are getting equated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if no deployed models are found what is the expectation from the tests?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not deployed model was found, the test will fail in the assertion here: https://github.com/opensearch-project/neural-search/blob/main/src/test/java/org/opensearch/neuralsearch/common/BaseNeuralSearchIT.java#L748
As I mentioned below, there seems to be a latency between task complete and model_state update to DEPLOYED. So I added a poll for model state there as well to avoid the issue.

)
.map(hitsMap -> (String) hitsMap.get("model_id"))
.collect(Collectors.toSet());
}

@SneakyThrows
protected MLModelState getModelState(String modelId) {
Response getModelResponse = makeRequest(
client(),
"GET",
String.format(LOCALE, "/_plugins/_ml/models/%s", modelId),
null,
toHttpEntity(""),
ImmutableList.of(new BasicHeader(HttpHeaders.USER_AGENT, DEFAULT_USER_AGENT))
);
Map<String, Object> getModelResponseJson = XContentHelper.convertToMap(
XContentType.JSON.xContent(),
EntityUtils.toString(getModelResponse.getEntity()),
false
);
return MLModelState.valueOf((String) getModelResponseJson.get("model_state"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also check if "model_state" key is present in response? I can image in case of service or network error it's possible, we can assert on this

}

/**
* Get the id for model currently deployed in the cluster. If there are no models deployed or it's more than 1 model
* fail on assertion
Expand Down
Loading