You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenSearch indexes have eventual read consistency. In general, indexed documents might not be available in search until the next refresh. On an OpenSearch cluster, including RefreshPolicy.IMMEDIATE in an IndexRequest will ensure that at least the primary shard is immediately searchable, but will acknowledge the request prior to copying to any replicas. Replication strategies such as Segment Replication don't enforce strong consistency (other than get-by-ID which always goes to the primary shard). In OpenSearch Serverless, there is a by-design refresh interval of 10 seconds that is not configurable, and documentation indicates a newly indexed document may not be searchable for up to 15 seconds.
Deleting a Connector invokes a search of the Model index to ensure that the connector is not being used:
searchHits.length + " models are still using this connector, please delete or update the models first!"
);
There is a similar check when updating a connector.
While indexing a new Remote Model does use RefreshPolicy.IMMEDIATE, there is still an edge case where a search might hit a replica shard that hasn't yet had the document updated. While likely rare in cluster-based OpenSearch this situation is much more likely in a serverless implementation.
Access-control checks such as this should not rely on search, but should use methods with strong consistency such as get-by-id.
What solution would you like?
Add a field to the Connector to track the remote model(s) which are registered to it. This field could either be a delimited string, or a list of strings. When a connector is first created, this can be set to an empty list / empty string. Registering a remote model would append the model ID to the field, and unregistering would remove it.
Backwards Compatibility is a concern: in case a Connector was already created and did not have this field, the null value of the field would indicate that it had never been initialized, and a search would still need to be conducted (once) to populate the field. After that, no further searches would be required.
End result: at most one search (for backwards compatibility) will be conducted, but in most cases the deletion logic will not require an additional search.
What alternatives have you considered?
Creating a separate index to track connector-model dependencies. (This would likely be the right thing to do in a RDB world where JOINs are a thing.). While effective, this adds a lot more complexity and isn't much different than the proposed solution.
Do you have any additional context?
There are similar searches involved in model/model-group deletions and updates, and with wildcard deletion of models that require a search for hidden models; these may require different approaches and the scope of this feature request is limited to connectors.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem?
OpenSearch indexes have eventual read consistency. In general, indexed documents might not be available in search until the next refresh. On an OpenSearch cluster, including
RefreshPolicy.IMMEDIATE
in anIndexRequest
will ensure that at least the primary shard is immediately searchable, but will acknowledge the request prior to copying to any replicas. Replication strategies such as Segment Replication don't enforce strong consistency (other than get-by-ID which always goes to the primary shard). In OpenSearch Serverless, there is a by-design refresh interval of 10 seconds that is not configurable, and documentation indicates a newly indexed document may not be searchable for up to 15 seconds.Deleting a Connector invokes a search of the Model index to ensure that the connector is not being used:
ml-commons/plugin/src/main/java/org/opensearch/ml/action/connector/DeleteConnectorTransportAction.java
Lines 78 to 86 in 93d0429
There is a similar check when updating a connector.
While indexing a new Remote Model does use
RefreshPolicy.IMMEDIATE
, there is still an edge case where a search might hit a replica shard that hasn't yet had the document updated. While likely rare in cluster-based OpenSearch this situation is much more likely in a serverless implementation.Access-control checks such as this should not rely on search, but should use methods with strong consistency such as get-by-id.
What solution would you like?
Add a field to the
Connector
to track the remote model(s) which are registered to it. This field could either be a delimited string, or a list of strings. When a connector is first created, this can be set to an empty list / empty string. Registering a remote model would append the model ID to the field, and unregistering would remove it.Backwards Compatibility is a concern: in case a Connector was already created and did not have this field, the
null
value of the field would indicate that it had never been initialized, and a search would still need to be conducted (once) to populate the field. After that, no further searches would be required.End result: at most one search (for backwards compatibility) will be conducted, but in most cases the deletion logic will not require an additional search.
What alternatives have you considered?
Creating a separate index to track connector-model dependencies. (This would likely be the right thing to do in a RDB world where JOINs are a thing.). While effective, this adds a lot more complexity and isn't much different than the proposed solution.
Do you have any additional context?
There are similar searches involved in model/model-group deletions and updates, and with wildcard deletion of models that require a search for hidden models; these may require different approaches and the scope of this feature request is limited to connectors.
The text was updated successfully, but these errors were encountered: