-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Poor Scaling with-in single node while running vector search workload #531
Comments
Hi @layavadi, to get better clarity on this, are you seeing similar issues when running tests on workloads other than vectorsearch (such as NYC Taxis, PMC, or http_logs)? |
Ian
I haven't tried on other workload.
Vadi
…On Tue, 14 May, 2024, 10:55 Ian Hoang, ***@***.***> wrote:
Hi @layavadi <https://github.com/layavadi>, to get better clarity on
this, are you seeing similar issues when running tests on workloads other
than vectorsearch (such as NYC Taxis, PMC, or http_logs)?
—
Reply to this email directly, view it on GitHub
<#531 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADZK35C4B5BZWPUIJOZUTWDZCJMZBAVCNFSM6AAAAABHRL3PLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJQHEZDOMRRGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There are multiple facets to this issue:
Most likely, this item should be followed up with the vector search team as indicated above. Please close this issue if that is the case. |
|
@gkamat I did try with NYC Taxi, With just 1 client I was able to saturate the CPU. it is not issue with the benchmark. As you mentioned it is with vectorsearch itself. |
@VijayanB Looks like both nmslib and faiss has the same issue ![]() |
@navneet1v Have you seen this pattern in your experiments? |
@layavadi when we are taking about throughput this is indexing throughput or Search Throughput? |
@navneet1v This is search only ..
Vadi
…On Tue, 4 Jun, 2024, 23:41 Navneet Verma, ***@***.***> wrote:
@layavadi <https://github.com/layavadi> when we are taking about
throughput this is indexing throughput or Search Throughput?
—
Reply to this email directly, view it on GitHub
<#531 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADZK35ASDOOD62PNZLHDF7LZFX7OFAVCNFSM6AAAAABHRL3PLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYGEZDGNJYHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@layavadi if this is Search, then what you are getting is expected for a single node, where 1 shard/1seg will get more throughput then 3 shards/1 seg per shard. Below are some of things to keep in mind here:
I am not sure from where you got this expectation that more shards will lead to better performance, this is true if you have replicas. Please let me know if you have more questions. Removing the bug label as this is not a bug. |
@navneet1v Thanks for the clarification. If that is the case , then as we scale clients , throughput should increase incase of 1 shard/1 seg. However throughput flattens and CPU utilisation is limited to just 1 core. Is it the case that when multiple clients are searching on 1 shard/1 seg index, they all serialise through 1 CPU ? BTW I did try 1 Shard/1 seg with 1 Replica on second node. It didn't scale either . |
@navneet1v is there any limit on search threads based on shards or CPU cores ? Is it possible with one shard we are using only 1 thread for searches ? |
No there is no limit like this. Every search request you send is picked up 1 search thread from search thread pool. The search thread pool size is ((# number of cores * 3)/2)+1 . So I would check if you are going beyond these number of Search clients. I would also check if there is any bottle neck from client machine which you are using to send the request. You can check this by ensuring your client machine cores > search clients you are setting in OSB.
This should not happen, it points me towards client machine not able to send enough traffic to Opensearch cluster. |
@navneet1v client machine has 4 cores. I had shown it to @prudhvigodithi before on the config and client was not saturated. I tried with NYC taxi workload it works fine. With 3 shards I am able to saturarte the CPU but with poor performance with the same number of clients , where as with same number of clients I am not able to saturate beyond 1 core in Opensarch node. Is there any search metrics I can get to understand what is happening ? |
@navneet1v and others, I think issue is with prometheus monitoring. Because of the short duration of the test which is with in a 1m , 5m average was discarding the peaks and I got mislead by node exporter metrics. My sincere apologies. I logged into the node and did real time monitoring which showed all the 4 Cores being used. Thanks again for jumping . Sorry for the false alarm. |
No problem I can understand, been there. happy your issue is resolved. |
Describe the bug
While running opensearch benchmark tool it was noticed that with vectorsearch workload, having multiple shards or multiple segment results in poor performance compare to 1 shard/ 1 segment . CPU utilisation with multiple shards ( single segment) is much higher than 1 shard / 1 segment but performance ( response time and throughput) was poor than 1 shard/1 segment config.
To Reproduce
R6i.xlarge single data node with k8 pods of data nodes deployed through opensearch helm chart. Each data node has the following setting .
opensearchJavaOpts: "-Xms6G -Xmx6G"
resources:
requests:
cpu: "3000m"
memory: "8Gi"
service:
type: LoadBalancer
persistence:
size: 51Gi
And benchmark parameter file is ( using lucene engine with l2 space for vector)
{
"target_index_name": "target_index",
"target_field_name": "target_field",
"target_index_body": "indices/lucene-index.json",
"target_index_primary_shards": 1,
"target_index_dimension": 768,
"target_index_space_type": "l2",
"target_index_bulk_size": 100,
"target_index_bulk_index_data_set_format": "hdf5",
"target_index_bulk_index_data_set_corpus": "cohere-1m",
"target_index_bulk_indexing_clients": 10,
"target_index_max_num_segments": 1,
"hnsw_ef_search": 256,
"hnsw_ef_construction": 256,
"query_k": 100,
"query_body": {
"docvalue_fields" : ["_id"],
"stored_fields" : "none"
},
"query_data_set_format": "hdf5",
"query_data_set_corpus": "cohere-1m",
"query_count": 10000,
"search_clients": 2
}
opensearch-benchmark execute-test --target-hosts$ENDPOINT --workload vectorsearch --workload-params $ {PARAMS_FILE} --pipeline benchmark-only --kill-running-processes --client-options=basic_auth_user:admin,basic_auth_password:Clouder@4213,verify_certs:false
Expected behavior
It is expected to scale linearly as we add more clients with more shards . But 1 shard/ 1 segment only consumes 1 core compare to 3 shard / 1 segment which consumes more than 3 cores, however resulting in poor performance compare to 1 shard/ 1 segment
Logs
If applicable, add logs to help explain your problem.
More Context (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: