Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Too many open files" error + memory leak due to dangling TCP connections #176

Open
somurzakov opened this issue Jul 20, 2024 · 1 comment

Comments

@somurzakov
Copy link

somurzakov commented Jul 20, 2024

If you are using grequests to make thousands of connections to different IP addresses, you probably noticed a couple issues:

  1. "Too many open files" error, due to requests keeping TCP connections alive and not closing TCP socket.
  2. Memory leaks and slow growth in memory consumed due to issue above

I have the same issue when creating thousands of connections, I am hitting a OS limit for open sockets.
Stackoverflow solution - DID NOT work for me.

Manually closing response object resp.close() - didn't help either, as connection would still be present in the TCP pool and left in CLOSED_WAIT state (from the lsof -i output).

What worked for me is manually closing the HTTPSConnectionPool object during processing the HTTP response:

urls = ["https://<your_url>/" for x in range(10000)]
reqs = [grequests.get(url) for url in urls]
for resp in grequests.imap(reqs, size=500):
    # do something with resp.content
    if resp: resp.raw._pool.close() # close the connection pool and all TCP connections inside 

You can check the number of open HTTP connections with lsof—i | grep python | grep https | wc—l before and after creating a few thousand connections, with and without manual socket closing to verify that it works.

If you will immediately close connection after processing response, you will never hit the limit of open file descriptors/sockets, while with the StackOverflow solution connections will still be there until hitting the timeout seconds.

The underlying reason seems to be that HTTPSConnection is a part of Response object, which is embedded in each AsyncRequest object. Meaning that as long as you carry around AsyncRequest object, the Response object will never be garbage collected, and underlying HTTPS connection will never be closed.

To solve the issue, you need to either:

  1. delete/overwrite both AsyncRequest and Response objects, to make sure no references are dangling and HTTPS connection is GC-ed
  2. Manually close connection pool: resp.raw._pool.close() - that way you can still retain and work with request/response objects, but the underlying TCP connection will be closed and resources will be cleaned up - you will no longer face "Too many open files error"
@spyoungtech
Copy link
Owner

spyoungtech commented Jul 20, 2024

Thanks for the report and writeup. Probably related to #137

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants