Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store size APIs should be updated to reflect when all data is not local #7332

Open
andrross opened this issue Apr 29, 2023 · 1 comment
Open
Assignees
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request

Comments

@andrross
Copy link
Member

andrross commented Apr 29, 2023

For more context, see #6528 (comment)

Also partially related to #7033

Existing APIs that report on "store size" will need to be expanded to account for the fact that data isn't all resident in local storage. It is still useful to know the total size of indexes and shards, but a new dimension will be needed to report the size of the data resident in local storage. This includes APIs like _cat/indices, _cat/shards, _cat/segments. Ultimately the goal here is to give users visibility into the resources being consumed by an index or shard.

@andrross andrross added bug Something isn't working untriaged enhancement Enhancement or improvement to existing feature or request distributed framework and removed bug Something isn't working untriaged labels Apr 29, 2023
@anasalkouz anasalkouz moved this to Todo in Concurrent Search May 2, 2023
@anasalkouz anasalkouz moved this from Todo to In Progress in Concurrent Search May 25, 2023
@tlfeng
Copy link
Collaborator

tlfeng commented May 30, 2023

There are several possible REST APIs changes mentioned in issue #6528:

  1. Based on the comments from @sohami, [RFC] Add Search to Remote-backed Storage #6528 (comment) and [RFC] Add Search to Remote-backed Storage #6528 (comment), the below REST API changes is proposed.
  • In _cat/indices _cat/shards _cat/segments APIs:
    Add a filter to allow user viewing summary of all the remote indices or local indices respectively. The usage of the filter may imitate the “node filters” used in nodes APIs (https://opensearch.org/docs/latest/api-reference/nodes-apis/index/#node-filters).

  • In _cat/allocation API:
    Add an optional column to display addressable remote store space by that node in addition to local storage space.

  1. Based on the above words from @andrross , the local file size is requested to be added.

They are different demands, so needs to be treated separately.


The below are my thoughts for the requirement mentioned by @andrross in this issue, revealing the local file size usage in REST APIs.
Guideline: implement a way to get the cache file size for each shard and segment, regardless the REST API changes.

There are 2 possible directions on how to calculate the file size

  1. Calculate the physical size of the cache file in the disk

Because the cache file is stored in a fixed location, and each segment file of each shard is directly shown from the file path, the requirement can be satisfied by directly getting the file size for each file or directory.

List of all the path of file cache can be collected through the method "List collectFileCacheDataPath(NodePath fileCacheNodePath)", where the file directory structure is also stated: <file cache path>/<index uuid>/<shard id>/....

An example to illustrate what the cache folder looks like:
The below are the cache files created after an index that backed by remote snapshot.

/data/nodes/0/cache/BDmFhJzBQVCWo1OVHdkgBg/0/RemoteLocalStore/_bz.cfs.0 (the file directory is omitted for other files)
_bz.cfs.1
_bz.cfs.5
_bz.cfs.6
_f3.fdm.0
_f3.fdt.0
_f3.fdt.74
_f3.fdx.0
_f3.kdd.12
_f3.kdi.0
_f3.kdm.0
_f3.nvd.0
_f3.nvm.0
_f3_Lucene90_0.doc.0
_f3_Lucene90_0.doc.7
_f3_Lucene90_0.dvd.0
_f3_Lucene90_0.dvd.24
_f3_Lucene90_0.dvm.0
_f3_Lucene90_0.pos.1
_f3_Lucene90_0.tim.0
_f3_Lucene90_0.tim.5
_f3_Lucene90_0.tip.0
_f3_Lucene90_0.tmd.0
_ue.cfe.0
_ue.cfs.0
_ue.cfs.2

The folder BDmFhJzBQVCWo1OVHdkgBg is the "index uuid", the folder 0 following is the "shard name", and all the files is the folder RemoteLocalStore are "segment" files, where _bz _f3 _ue are "segment name".
Get the file size of each segment, shard, or index is straightforward.

  1. Calculate the logic size of the cache file
    Because the cache file is downloaded by block with a known size (1466350), it's possible to get the cache size of each segment or shard if tracking all the operations of the cache.
    While I don't think this way is reasonable, for the following reasons:
  • Calculate the file size for each segment after every cache update operation, or track all the cache operations is cumbersome and will be a burden to system resources.
  • The existing way of calculating segment and shard sizes are using physical file size in the disk.

There are 2 possible directions on how to get the statistic for the REST APIs:

  1. Calculate the required file size when calling REST API, which also called calculating "on the fly".
    Pros: The time consumed by the file size collection can be little, when the frequency of calling the REST API for the local file size is low, which is usually the case.
    Cons: Result in more latency in getting the REST API response.
  2. Calculate all the segment, shard and index file size periodically and store in variables.
    Pros: Faster in getting REST API response.
    Cons: Periodically calculating file sizes will be a burden to system resources.

My opinion is to calculate the required file size when required (calling REST API), and there is little doubt to choose this way for the following reasons:

Additional context:

  1. The existing code to calculate shard size: https://github.com/opensearch-project/OpenSearch/blob/2.7.0/server/src/main/java/org/opensearch/index/store/ByteSizeCachingDirectory.java#L74, which is introduced in commit elastic/elasticsearch@80062fb
  2. The existing code to calculate segment size: https://github.com/opensearch-project/OpenSearch/blob/2.7.0/server/src/main/java/org/opensearch/index/engine/Engine.java#L986, which is introduced in commit elastic/elasticsearch@4df83dc. You may have the question that why segment size is implemented by a different way from shard size, even if shards consist of segments. It's mentioned by the author in the PR of the commit, and there is also a TODO added in the code to move the implementation.

It seems to me that a better place for this would be under StoreStats, because Store already queries the effective size of all physical files and caches the result... but had no idea/luck on how to grab a SegmentInfo for uncommitted segments (only does show some disk usage after forcing a _flush).

Implementation

Cache size calculation for each shard and its segments
There is an existing method loadFileCachePath to get the file cache path by NodeEnvironment and ShardId.
Iterate all the files in the folder of the path, and the file size can be obtained by native method File.length().
During the iteration, cache size for each segment in the node can also be collected. Because all the files in the shard cache folder are segment files, whose name starting with the segment name, such as "_0" or "_a1", grouping by the prefix of the file name can get the total cache size for each segment.
A map can be used to store the size for segments by their name.
A sample code I wrote can be seen here.

The variable to store the local cache size for each shard
Shard statistic that used by _cat/shard and _cat/indices API is collected by the method TransportIndicesStatsAction.shardOperation(). It returns the object ShardStats and I think the file cache size should also be included in ShardStats.
The shard size is stored in an object CommonStats within ShardStats. CommonStats contains all the shard statistic that is accumulative to composite index statistic through IndexStats.
Because file cache size for each shard also needs to be accumulated to get total file cache size for an index, I think a new object needs to be created to store the file cache size for each shard or even segment in the class CommonStats.

The variable to store the local cache size for each segment
However, collecting the segment statistic that used by _cat/segments is different from shard statistic. Segments information for each shard that used by _cat/segments API is collected by the method TransportIndicesSegmentsAction.shardOperation(). It returns the object ShardSegments, then each segment object is traversed to get its statistic when building the REST API response in RestSegmentsAction. I think the file cache size should be stored in class ShardSegments.

The location for the codes to calculate file cache size for each shard and segment
As described above, obtaining the file cache size needs the file path for the cache, which requires NodeEnvironment object, so the class to put the new codes of cache calculation should have an instance of NodeEnvironment. Besides, both of the methods TransportIndicesStatsAction.shardOperation() and TransportIndicesSegmentsAction.shardOperation(), which collects the shard and segment statistic for cat API response, use the object IndexService. Luckily, the class IndexService can access NodeEnvironment, so I propose to put the new codes of caculating the file cache size for each shard and segment in the class IndexService.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request
Projects
Status: In Progress
Development

No branches or pull requests

2 participants