[BUG] Poor Scaling with-in single node while running vector search workload #531

layavadi · 2024-05-10T23:13:55Z

Describe the bug
While running opensearch benchmark tool it was noticed that with vectorsearch workload, having multiple shards or multiple segment results in poor performance compare to 1 shard/ 1 segment . CPU utilisation with multiple shards ( single segment) is much higher than 1 shard / 1 segment but performance ( response time and throughput) was poor than 1 shard/1 segment config.

To Reproduce
R6i.xlarge single data node with k8 pods of data nodes deployed through opensearch helm chart. Each data node has the following setting .
opensearchJavaOpts: "-Xms6G -Xmx6G"
resources:
requests:
cpu: "3000m"
memory: "8Gi"
service:
type: LoadBalancer
persistence:
size: 51Gi

And benchmark parameter file is ( using lucene engine with l2 space for vector)
{
"target_index_name": "target_index",
"target_field_name": "target_field",
"target_index_body": "indices/lucene-index.json",
"target_index_primary_shards": 1,
"target_index_dimension": 768,
"target_index_space_type": "l2",
"target_index_bulk_size": 100,
"target_index_bulk_index_data_set_format": "hdf5",
"target_index_bulk_index_data_set_corpus": "cohere-1m",
"target_index_bulk_indexing_clients": 10,
"target_index_max_num_segments": 1,
"hnsw_ef_search": 256,
"hnsw_ef_construction": 256,
"query_k": 100,
"query_body": {
"docvalue_fields" : ["_id"],
"stored_fields" : "none"
},
"query_data_set_format": "hdf5",
"query_data_set_corpus": "cohere-1m",
"query_count": 10000,
"search_clients": 2
}

opensearch-benchmark execute-test --target-hosts $ENDPOINT --workload vectorsearch --workload-params ${PARAMS_FILE} --pipeline benchmark-only --kill-running-processes --client-options=basic_auth_user:admin,basic_auth_password:Clouder@4213,verify_certs:false

Expected behavior
It is expected to scale linearly as we add more clients with more shards . But 1 shard/ 1 segment only consumes 1 core compare to 3 shard / 1 segment which consumes more than 3 cores, however resulting in poor performance compare to 1 shard/ 1 segment

Logs
If applicable, add logs to help explain your problem.

More Context (please complete the following information):

Workload cohere-1M
Service - OpenSearch)
Version 2.12

Additional context
Add any other context about the problem here.

IanHoang · 2024-05-14T18:54:50Z

Hi @layavadi, to get better clarity on this, are you seeing similar issues when running tests on workloads other than vectorsearch (such as NYC Taxis, PMC, or http_logs)?

layavadi · 2024-05-14T23:16:22Z

Ian I haven't tried on other workload. Vadi

…

On Tue, 14 May, 2024, 10:55 Ian Hoang, ***@***.***> wrote: Hi @layavadi <https://github.com/layavadi>, to get better clarity on this, are you seeing similar issues when running tests on workloads other than vectorsearch (such as NYC Taxis, PMC, or http_logs)? — Reply to this email directly, view it on GitHub <#531 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADZK35C4B5BZWPUIJOZUTWDZCJMZBAVCNFSM6AAAAABHRL3PLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJQHEZDOMRRGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

gkamat · 2024-05-15T21:49:48Z

There are multiple facets to this issue:

Issues in this repository pertain only to OSB. If OSB is scaling properly for the 1-shard, 1-segment case and simply changing the backend configuration results in a performance change, you should open an issue on the server side with the vector search team.
The 3 shard case probably has 3 segments in total, i.e., each shard with 1 segment, rather than 1 segment in total. Is that not the case?
How much data has been ingested in these tests? Does the 3-shard setup have more vectors?
Is there any deleted data?
Is one of the shards hot?
Have you run any query profiles?

Most likely, this item should be followed up with the vector search team as indicated above. Please close this issue if that is the case.

layavadi · 2024-05-22T00:20:36Z

Hi @layavadi, to get better clarity on this, are you seeing similar issues when running tests on workloads other than vectorsearch (such as NYC Taxis, PMC, or http_logs)?

@IanHoang Is there a way to only run "search" operations in NYC taxis ? Do I have to modify the default.json to make it happen ?

layavadi · 2024-05-22T00:26:33Z

There are multiple facets to this issue:

Issues in this repository pertain only to OSB. If OSB is scaling properly for the 1-shard, 1-segment case and simply changing the backend configuration results in a performance change, you should open an issue on the server side with the vector search team.

The 3 shard case probably has 3 segments in total, i.e., each shard with 1 segment, rather than 1 segment in total. Is that not the case?

How much data has been ingested in these tests? Does the 3-shard setup have more vectors?

Is there any deleted data?

Is one of the shards hot?

Have you run any query profiles?

Most likely, this item should be followed up with the vector search team as indicated above. Please close this issue if that is the case.

I have informed the vectosearch team about the performance. I just want to make sure that OSB side, it can push the queries not serialising the clients. Also is there a way to run NYC taxi search only workload ? vectorsearch provides that .
Yes 3 shard case has 3 segments , 1 for each shard
Cohere 1 M data is what is used to populate the 3 shards. We haven't changed the insertion volume. Kept it constant between various configurations
No . Workload is load once and then run the benchmark with search-only option
as per the metrics no. queries are equally distributed between shards
I am currently running cluster in EKS. Performance analyser is not supported yet in k8 environment. Any suggestion here will help.

layavadi · 2024-05-23T04:25:08Z

@gkamat I did try with NYC Taxi, With just 1 client I was able to saturate the CPU. it is not issue with the benchmark. As you mentioned it is with vectorsearch itself.

layavadi · 2024-05-31T04:25:36Z

@VijayanB Looks like both nmslib and faiss has the same issue

VijayanB · 2024-06-04T18:01:28Z

@navneet1v Have you seen this pattern in your experiments?

navneet1v · 2024-06-04T18:11:25Z

@layavadi when we are taking about throughput this is indexing throughput or Search Throughput?

layavadi · 2024-06-05T03:49:44Z

@navneet1v This is search only .. Vadi

…

On Tue, 4 Jun, 2024, 23:41 Navneet Verma, ***@***.***> wrote: @layavadi <https://github.com/layavadi> when we are taking about throughput this is indexing throughput or Search Throughput? — Reply to this email directly, view it on GitHub <#531 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADZK35ASDOOD62PNZLHDF7LZFX7OFAVCNFSM6AAAAABHRL3PLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYGEZDGNJYHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

navneet1v · 2024-06-05T07:08:56Z

It is expected to scale linearly as we add more clients with more shards . But 1 shard/ 1 segment only consumes 1 core compare to 3 shard / 1 segment which consumes more than 3 cores, however resulting in poor performance compare to 1 shard/ 1 segment

@layavadi if this is Search, then what you are getting is expected for a single node, where 1 shard/1seg will get more throughput then 3 shards/1 seg per shard. Below are some of things to keep in mind here:

When you have 1 shard, Opensearch have an optimization built in place in which it will reduce the number of round trips and do query and fetch in 1 single call. Code Ref This true for all indices in Opensearch.
With 1 node and more shards more CPUs(1 per shard) are getting utilized and you can see its 1 JVM now handling more queries(1 per shard) hence more GC cycles will happen which will also lead to lower throughput.
Another thing to note here is with more shards, each shard will give its size number of results and then at the coordinator node we need to pick up top results some CPU cycles are also lost there.

I am not sure from where you got this expectation that more shards will lead to better performance, this is true if you have replicas.

Please let me know if you have more questions. Removing the bug label as this is not a bug.

layavadi · 2024-06-05T08:29:32Z

@navneet1v Thanks for the clarification. If that is the case , then as we scale clients , throughput should increase incase of 1 shard/1 seg. However throughput flattens and CPU utilisation is limited to just 1 core. Is it the case that when multiple clients are searching on 1 shard/1 seg index, they all serialise through 1 CPU ? BTW I did try 1 Shard/1 seg with 1 Replica on second node. It didn't scale either .

layavadi · 2024-06-11T05:01:53Z

@navneet1v is there any limit on search threads based on shards or CPU cores ? Is it possible with one shard we are using only 1 thread for searches ?

navneet1v · 2024-06-11T05:40:21Z

@navneet1v is there any limit on search threads based on shards or CPU cores ? Is it possible with one shard we are using only 1 thread for searches ?

No there is no limit like this. Every search request you send is picked up 1 search thread from search thread pool. The search thread pool size is ((# number of cores * 3)/2)+1 . So I would check if you are going beyond these number of Search clients.

I would also check if there is any bottle neck from client machine which you are using to send the request. You can check this by ensuring your client machine cores > search clients you are setting in OSB.

BTW I did try 1 Shard/1 seg with 1 Replica on second node. It didn't scale either .

This should not happen, it points me towards client machine not able to send enough traffic to Opensearch cluster.

layavadi · 2024-06-11T09:10:01Z

@navneet1v client machine has 4 cores. I had shown it to @prudhvigodithi before on the config and client was not saturated. I tried with NYC taxi workload it works fine. With 3 shards I am able to saturarte the CPU but with poor performance with the same number of clients , where as with same number of clients I am not able to saturate beyond 1 core in Opensarch node. Is there any search metrics I can get to understand what is happening ?

layavadi · 2024-06-12T06:58:30Z

@navneet1v and others, I think issue is with prometheus monitoring. Because of the short duration of the test which is with in a 1m , 5m average was discarding the peaks and I got mislead by node exporter metrics. My sincere apologies. I logged into the node and did real time monitoring which showed all the 4 Cores being used. Thanks again for jumping . Sorry for the false alarm.

navneet1v · 2024-06-12T16:05:31Z

@navneet1v and others, I think issue is with prometheus monitoring. Because of the short duration of the test which is with in a 1m , 5m average was discarding the peaks and I got mislead by node exporter metrics. My sincere apologies. I logged into the node and did real time monitoring which showed all the 4 Cores being used. Thanks again for jumping . Sorry for the false alarm.

No problem I can understand, been there. happy your issue is resolved.

layavadi added bug Something isn't working untriaged labels May 10, 2024

IanHoang removed the untriaged label May 13, 2024

navneet1v added question Further information is requested and removed bug Something isn't working labels Jun 5, 2024

layavadi closed this as completed Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Poor Scaling with-in single node while running vector search workload #531

[BUG] Poor Scaling with-in single node while running vector search workload #531

layavadi commented May 10, 2024

IanHoang commented May 14, 2024

layavadi commented May 14, 2024 via email

gkamat commented May 15, 2024

layavadi commented May 22, 2024

layavadi commented May 22, 2024

layavadi commented May 23, 2024

layavadi commented May 31, 2024

VijayanB commented Jun 4, 2024

navneet1v commented Jun 4, 2024

layavadi commented Jun 5, 2024 via email •

edited

Loading

navneet1v commented Jun 5, 2024

layavadi commented Jun 5, 2024

layavadi commented Jun 11, 2024

navneet1v commented Jun 11, 2024

layavadi commented Jun 11, 2024

layavadi commented Jun 12, 2024

navneet1v commented Jun 12, 2024

[BUG] Poor Scaling with-in single node while running vector search workload #531

[BUG] Poor Scaling with-in single node while running vector search workload #531

Comments

layavadi commented May 10, 2024

IanHoang commented May 14, 2024

layavadi commented May 14, 2024 via email

gkamat commented May 15, 2024

layavadi commented May 22, 2024

layavadi commented May 22, 2024

layavadi commented May 23, 2024

layavadi commented May 31, 2024

VijayanB commented Jun 4, 2024

navneet1v commented Jun 4, 2024

layavadi commented Jun 5, 2024 via email • edited Loading

navneet1v commented Jun 5, 2024

layavadi commented Jun 5, 2024

layavadi commented Jun 11, 2024

navneet1v commented Jun 11, 2024

layavadi commented Jun 11, 2024

layavadi commented Jun 12, 2024

navneet1v commented Jun 12, 2024

layavadi commented Jun 5, 2024 via email •

edited

Loading