Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Capability to remove _recovery_source per field #13490

Closed
navneet1v opened this issue May 1, 2024 · 7 comments · Fixed by #13590
Closed

[Feature Request] Capability to remove _recovery_source per field #13490

navneet1v opened this issue May 1, 2024 · 7 comments · Fixed by #13590
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Performance

Comments

@navneet1v
Copy link
Contributor

navneet1v commented May 1, 2024

Is your feature request related to a problem? Please describe

While doing the indexing all the fields which are getting ingested in Opensearch are stored as _source. If user requires they can disable the _source per field or completely for all the fields. But if user does this, the _recovery_source gets added(ref), which gets removed later on.

So overall the whole payload will still be used as a StoredField and impacts the indexing time. The impact on indexing time is high if one of the field is a vector field. In my experiments with 768D 1M dataset I can see a 50% reduction in indexing latency at p90 level.

Benchmarking Results

Below are the benchmarking results. Refer Appendix A for the flame graphs

Cluster Configuration

Data Nodes r5.2xlarge
Data Nodes Count 3
Dataset cohere-768D
Number of Vectors 1M
Tool OSB
Indexing Clients 1
Number of Shards 3
Index Thread Qty 1
Engine Nmslib
EF Construction 256
EF Search 256

Baseline Results

Metric Task Value Unit
Cumulative indexing time of primary shards 6.3266 min
Min cumulative indexing time across primary shards 0 min
Median cumulative indexing time across primary shards 1.99598 min
Max cumulative indexing time across primary shards 2.27533 min
Cumulative indexing throttle time of primary shards 0 min
Min cumulative indexing throttle time across primary shards 0 min
Median cumulative indexing throttle time across primary shards 0 min
Max cumulative indexing throttle time across primary shards 0 min
Cumulative merge time of primary shards 45.0456 min
Cumulative merge count of primary shards 7
Min cumulative merge time across primary shards 0 min
Median cumulative merge time across primary shards 14.4361 min
Max cumulative merge time across primary shards 15.8919 min
Cumulative merge throttle time of primary shards 0.168133 min
Min cumulative merge throttle time across primary shards 0 min
Median cumulative merge throttle time across primary shards 0.0523167 min
Max cumulative merge throttle time across primary shards 0.0632667 min
Cumulative refresh time of primary shards 6.0824 min
Cumulative refresh count of primary shards 144
Min cumulative refresh time across primary shards 0 min
Median cumulative refresh time across primary shards 1.90423 min
Max cumulative refresh time across primary shards 2.14323 min
Cumulative flush time of primary shards 23.2522 min
Cumulative flush count of primary shards 32
Min cumulative flush time across primary shards 0 min
Median cumulative flush time across primary shards 7.4859 min
Max cumulative flush time across primary shards 8.08212 min
Total Young Gen GC time 0.927 s
Total Young Gen GC count 39
Total Old Gen GC time 0 s
Total Old Gen GC count 0
Store size 12.4152 GB
Translog size 5.6345e-07 GB
Min Throughput custom-vector-bulk 885.98 docs/s
Mean Throughput custom-vector-bulk 1017.74 docs/s
Median Throughput custom-vector-bulk 1007.87 docs/s
Max Throughput custom-vector-bulk 1082.07 docs/s
50th percentile latency custom-vector-bulk 44.5227 ms
90th percentile latency custom-vector-bulk 59.4799 ms
99th percentile latency custom-vector-bulk 185.057 ms
99.9th percentile latency custom-vector-bulk 335.08 ms
99.99th percentile latency custom-vector-bulk 490.821 ms
100th percentile latency custom-vector-bulk 545.323 ms

Removing Recovering Source and _source

POC code: navneet1v@6c5896a

Metric Task Value Unit
Cumulative indexing time of primary shards 5.49067 min
Min cumulative indexing time across primary shards 1.71528 min
Median cumulative indexing time across primary shards 1.77818 min
Max cumulative indexing time across primary shards 1.9972 min
Cumulative indexing throttle time of primary shards 0 min
Min cumulative indexing throttle time across primary shards 0 min
Median cumulative indexing throttle time across primary shards 0 min
Max cumulative indexing throttle time across primary shards 0 min
Cumulative merge time of primary shards 0.3878 min
Cumulative merge count of primary shards 3
Min cumulative merge time across primary shards 0.118133 min
Median cumulative merge time across primary shards 0.133017 min
Max cumulative merge time across primary shards 0.13665 min
Cumulative merge throttle time of primary shards 0 min
Min cumulative merge throttle time across primary shards 0 min
Median cumulative merge throttle time across primary shards 0 min
Max cumulative merge throttle time across primary shards 0 min
Cumulative refresh time of primary shards 13.38 min
Cumulative refresh count of primary shards 86
Min cumulative refresh time across primary shards 4.0509 min
Median cumulative refresh time across primary shards 4.11963 min
Max cumulative refresh time across primary shards 5.20948 min
Cumulative flush time of primary shards 31.9997 min
Cumulative flush count of primary shards 27
Min cumulative flush time across primary shards 10.0862 min
Median cumulative flush time across primary shards 10.2099 min
Max cumulative flush time across primary shards 11.7035 min
Store size 11.7547 GB
Translog size 1.19775 GB
Min Throughput custom-vector-bulk 932.63 docs/s
Mean Throughput custom-vector-bulk 1133.94 docs/s
Median Throughput custom-vector-bulk 1131.69 docs/s
Max Throughput custom-vector-bulk 1167.02 docs/s
50th percentile latency custom-vector-bulk 39.0611 ms
90th percentile latency custom-vector-bulk 47.5916 ms
99th percentile latency custom-vector-bulk 83.2863 ms
99.9th percentile latency custom-vector-bulk 177.238 ms
99.99th percentile latency custom-vector-bulk 288.076 ms
100th percentile latency custom-vector-bulk 319.616 ms
error rate custom-vector-bulk 0 %

Describe the solution you'd like

Just like _source where we can specify what fields are included/excluded in _source or completely disable _source, I was thinking to have same capability for _recovery_source. This will ensure that users can remove their fields from _recovery source if required.

Related component

Indexing:Performance

Describe alternatives you've considered

In terms of alternative there is no alternative to disable the _recovery_source for an index.

FAQ

Q1: If a user needs vector field how they can retrieve the vector field? also Recovery source and _source is used for other purpose like update by query, disaster recovery etc etc, how we are going to support that?

In k-NN repo we are working on a PR which will ensure that we read k-NN vector field from doc values. Ref: opensearch-project/k-NN#1571. For other fields that require such capabilities can be added in core in incremental fashion if needed.

Appendix A

Flame graph when _source/_recovery_source is getting stored during indexing for vector fields.

320730815-7752c503-1b97-42e7-885b-0dc505e7a43f

Flame graph when _source and _recovery_source is not getting stored.

320730873-877fced6-6e67-4199-b939-d3ef48570a94

@navneet1v navneet1v added enhancement Enhancement or improvement to existing feature or request untriaged labels May 1, 2024
@navneet1v navneet1v self-assigned this May 1, 2024
@navneet1v
Copy link
Contributor Author

navneet1v commented May 1, 2024

I can work on building this feature.

@msfroh, @reta , @dblock

@navneet1v
Copy link
Contributor Author

Updated the flame graphs in the appendix section of the issue.

@mgodwan
Copy link
Member

mgodwan commented May 2, 2024

Thanks @navneet1v for sharing these insights.

  1. What is the compression ratio we see for our benchmark corpus documents for stored fields when compared to the raw document data?
  2. I'm wondering if the inherent problem here is the compression algorithm (looking at the flame graphs). Have you already experimented with disabling compression/trying zstd on stored fields to see what kind of behaviour they show?
  3. _recovery_source was introduced to support ops based recovery through Lucene after soft deletes were enabled so that we can rely on Lucene changes snapshot. How will you continue to support this with _recovery_source disabled?

@mgodwan
Copy link
Member

mgodwan commented May 2, 2024

Just like _source where we can specify what fields are included/excluded in _source or completely disable _source, I was thinking to have same capability for _recovery_source. This will ensure that users can remove their fields from _recovery source if required.

Today, in case original source differs from source being written (due to including/excluding any specific fields), recovery source is always written using original source to ensure ops based recovery. Won't this change divert from the intent it was serving?

@navneet1v
Copy link
Contributor Author

navneet1v commented May 2, 2024

@mgodwan I have ans this question in the FAQ section of the above issue. For vector field we are working on a PR which will allow the vector field values to be read from doc values and put in _source response. This will ensure that recovery is possible after a crash.

Plus given that _recovery_source gets deleted after a certain point of time recovery source was never a full proof way to recover from crashes.

@mgodwan
Copy link
Member

mgodwan commented May 13, 2024

Plus given that _recovery_source gets deleted after a certain point of time recovery source was never a full proof way to recover from crashes.

If I understand the feature correctly, I don't think this was the use case for it ever.
It is used in case of primary-replica syncing (similar to how translog had been used in the past). Please evaluate how it will continue to function after this change to disable the _recovery_source completely as since 2.x, it is the default way for syncing replicas with primary.

@navneet1v
Copy link
Contributor Author

Please evaluate how it will continue to function after this change to disable the _recovery_source completely as since 2.x, it is the default way for syncing replicas with primary.

Is there a way I can check this? any IT or anything else that can help me here.

@mgodwan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Performance
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants