[BUG] msearch hangs when dealing with a high number of records. #517

manugarri · 2023-09-30T13:47:40Z

What is the bug?

Im running a search job on a big batch file (900K records). as such, im using multisearch. The cluster has 3 data nodes and 3 master nodes.

I split the records in batches. The weird thing is, if i run batches of 5000 records. the job takes around 200 seconds to process. monitoring aws metrics show no apparent issue with memory/cpu on any of the nodes.

However, if i use 10000 records for the msearch command, something strange happens.

For a while the cluster is performing the search operations, i can see there are active/queued on the threadpool api endpoint /_cat/thread_pool/search . However, after a certain point, there are no more active/queue/rejected threads on the threadpool, but the python msearch call just hangs , and it hangs around for ever. I have to kill the jupyter kernel to make it work.

How can one reproduce the bug?

Cant share the data im using unfortunately, and the data used for search is correlated with the number of records that make the search hang.

But in a nutshell, running this

msearch_result = search_client.msearch(
        msearch_query, 
    )

with a high volume of records makes the job crash, not on the Opensearch side, but on the python client side.

It is important to note that the records are querying any of the 50 or so indices we have, so not all records on the msearch call go to the same index.

However, using the requests library directly (with the aws-auth library for authentication) works perfectly.

#this works with no problem
resp = requests.post( 'https://'+endpoint+'/_msearch', data=msearch_query, headers={'Content-Type': 'application/json'}, timeout=500)

What is the expected behavior?

python client should handle the request, or if the return body from the multisearch operation is too big, raise an appropriate exception

What is your host/environment?

opensearchpy 2.2.0

OS:
ProductName: macOS
ProductVersion: 14.0
BuildVersion: 23A344

The text was updated successfully, but these errors were encountered:

manugarri · 2023-09-30T16:21:33Z

UPDATE, i realised that the issue is still happening when using the requests library. Im not sure why would an msearch request hang when the cluster is done with the actual search , but it is not an issue with this library.

In fact sometimes the query succeeds but the return message is '{\n "message": "Request Timeout",\n}' Curiously only queries that fail are those that take above 300 seconds, which means this is probably related to some timeout networking settings i cant seem to be able to find.

manugarri added bug Something isn't working untriaged Need triage labels Sep 30, 2023

manugarri closed this as completed Sep 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] msearch hangs when dealing with a high number of records. #517

[BUG] msearch hangs when dealing with a high number of records. #517

manugarri commented Sep 30, 2023 •

edited

Loading

manugarri commented Sep 30, 2023 •

edited

Loading

[BUG] msearch hangs when dealing with a high number of records. #517

[BUG] msearch hangs when dealing with a high number of records. #517

Comments

manugarri commented Sep 30, 2023 • edited Loading

What is the bug?

How can one reproduce the bug?

What is the expected behavior?

What is your host/environment?

manugarri commented Sep 30, 2023 • edited Loading

manugarri commented Sep 30, 2023 •

edited

Loading

manugarri commented Sep 30, 2023 •

edited

Loading