[Enhancement] Optimize FieldDataCache removal flow #13862

sgup432 · 2024-05-28T23:55:57Z

Is your feature request related to a problem? Please describe

FieldDataCache is a node level cache and uses a composite key (indexFieldCache + shardId + IndexReader.CacheKey) to uniquely identify an item. IndexFieldCache further contains fieldName and index level details.

As of today, any item in fieldDataCache is removed in blocking/sync manner. This happens in below scenarios:

Invalidation: During refresh of any indexShard, the desired key is immediately invalidated in a sync manner here
Index deletion: During removal of an index, all related indexShard fields are removed. This is also done in a sync manner. See this. Here we iterate overall ALL cache keys, check whether key belongs to this index and then delete it. Highly inefficient.

Problem

We already have had issues where during index removal, data node dropped as a lot of time/cpu was taken up in clearing up fieldDataCache.
Scenario(observed in production): Cluster manager node sends cluster state update task to data node on index removal. Data node starts clearing up fieldData cache on same clusterStateApplier(clusterApplierService#updateTask) thread, taking a lot of time(due to large cache size and inefficient all key traversal) and unable to acknowledge back to cluster manager node. This eventually resulted in this data node being removed from cluster.

Sample hot thread dump observed

100.4% (501.8ms out of 500ms) cpu usage by thread 'opensearch[46a4b13b820a8bcf60ac8f1de15cee14][clusterApplierService#updateTask][T#1]'
     10/10 snapshots sharing following 22 elements
       app//org.opensearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.clear(IndicesFieldDataCache.java:219)
       app//org.opensearch.index.fielddata.IndexFieldDataService.clear(IndexFieldDataService.java:107)
       app//org.opensearch.index.fielddata.IndexFieldDataService.close(IndexFieldDataService.java:179)
       app//org.opensearch.core.internal.io.IOUtils.close(IOUtils.java:87)
       app//org.opensearch.core.internal.io.IOUtils.close(IOUtils.java:129)
       app//org.opensearch.core.internal.io.IOUtils.close(IOUtils.java:79)
       app//org.opensearch.index.IndexService.close(IndexService.java:362)
       app//org.opensearch.indices.IndicesService.removeIndex(IndicesService.java:887)
       app//org.opensearch.indices.cluster.IndicesClusterStateService.removeIndices(IndicesClusterStateService.java:410)
       app//org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:255)
       app//org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:591)
       app//org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:578)
       app//org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:546)
       app//org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:469)
       app//org.opensearch.cluster.service.ClusterApplierService.access$000(ClusterApplierService.java:81)
       app//org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:180)

Describe the solution you'd like

As a solution, I suggest we should do following:

Cache clear/invalidation logic should be done in an async manner like we do in RequestCache. Here on any index removal, we will hold all such stale indices in a list and eventually clean them up on a separate thread in some X interval.
During any item removal in the cache, multiple removal listeners are invoked in a sync manner to update stats. We should also try doing this in an async manner on a different thread.
We should avoid going through all the entries in the cache event though we need to delete entries only for 1 or few indices. This can be done either by deleting specific entries for an index by calling invalidateAll for set of keys or delaying this until a later point where we want to delete multiple indices.
Consider integrating FieldDataCache with TieredCache(heap + disk)/DiskCache considering current default onHeapCache size is unlimited and only controlled by field data circuit breakers. This would clear up a lot of heap.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

jainankitk · 2024-05-29T00:27:48Z

@sgup432 - Thank you for describing the issue and potential solution. Few considerations:

Currently, the field data cache cleanup is happening on the cluster applier thread. This will prevent the ~~data~~node from acknowledging new cluster state version in case of failure. While this does seem bit extreme and undesirable, I am not sure how we handle the failures in case of async cleanup
I am not completely convinced about the offheap benefit for field data cache as the purpose is slightly different than the request/query cache. Although, it is possible that the page cache mitigates the latency for getting ordinals value from field data cache during aggregation. But, lower priority for now IMO.
I will probably prioritize 3. over 1. & 2. when it comes to execution, since 3. benefits irrespective of the sync/async mechanism

jainankitk · 2024-05-29T00:29:04Z

@sohami @msfroh @Bukhtawar - Is there any specific reason for field data cache cleanup happening on the cluster applier thread?

sgup432 · 2024-05-29T01:02:40Z

Currently, the field data cache cleanup is happening on the cluster applier thread. This will prevent the data from acknowledging new cluster state version in case of failure. While this does seem bit extreme and undesirable, I am not sure how we handle the failures in case of async cleanup

Yeah even I thought about it. I didn't see any reason as to why we should fail data node's cluster state update logic during index removal in case underlying fieldDataCache clean up fails. Considering index was already removed from the node, worst case would be stale entries lying in the cache in case we just swallowed up fieldCache cleanup exception.

So thought doing this in an async manner shouldn't be a problem as such IMO. But would like to hear other opinions as well as I might be wrong in my understanding.

Plus seems like field data cache cleanup won't likely throw any exception considering internal cache.invalidate seems safe even if a key is not present.

Bukhtawar · 2024-05-29T12:52:25Z

@sohami @msfroh @Bukhtawar - Is there any specific reason for field data cache cleanup happening on the cluster applier thread?

Any mutating operations to cluster state, including index creation/deletion needs to be applied to the local data structures of the node applying the state. Generally these appliers are processed sequentially and in a blocking manner to ensure that all local structures are successfully refreshed before a cluster state commit acknowledgement can be sent back.
While I agree certain operations like the field data cache evicting isn't super-critical and doesn't require the node to hold back the acknowledgement. Also see #9996 where an index deletion clean up took sufficiently longer to unlink causing node drops.

I think we need to holistically look into this problem and start with a mechanism like soft-eviction or soft-deletes which just marks the entry as deleted or stale while the actual clean-up can happen in the background

peternied · 2024-05-29T15:12:20Z

[Triage - attendees 1 2 3 4 5]
@sgup432 Thanks for creating this issue, looking forward to a pull request to improve OpenSearch.

sgup432 added enhancement Enhancement or improvement to existing feature or request untriaged labels May 28, 2024

github-actions bot added the Search:Performance label May 28, 2024

github-project-automation bot added this to Search Project Board May 28, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board May 28, 2024

peternied removed the untriaged label May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Optimize FieldDataCache removal flow #13862

[Enhancement] Optimize FieldDataCache removal flow #13862

sgup432 commented May 28, 2024 •

edited

Loading

jainankitk commented May 29, 2024 •

edited

Loading

jainankitk commented May 29, 2024

sgup432 commented May 29, 2024

Bukhtawar commented May 29, 2024

peternied commented May 29, 2024

[Enhancement] Optimize FieldDataCache removal flow #13862

[Enhancement] Optimize FieldDataCache removal flow #13862

Comments

sgup432 commented May 28, 2024 • edited Loading

Is your feature request related to a problem? Please describe

Problem

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

jainankitk commented May 29, 2024 • edited Loading

jainankitk commented May 29, 2024

sgup432 commented May 29, 2024

Bukhtawar commented May 29, 2024

peternied commented May 29, 2024

sgup432 commented May 28, 2024 •

edited

Loading

jainankitk commented May 29, 2024 •

edited

Loading