Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAM increase slowly #137

Open
Usernamezhx opened this issue Apr 2, 2020 · 7 comments
Open

RAM increase slowly #137

Usernamezhx opened this issue Apr 2, 2020 · 7 comments

Comments

@Usernamezhx
Copy link

first of all. thanks for your work. when I use the grequests the RAM will increase slowly. code such as :

def exception_handler(request, exception):
    print("Request failed request:{} \n exception:{} ".format(request,exception))

if __name__ == '__main__':
    task = []
    f_file= "./data_scp/3031_xiuxiu_coverImage_v1.dat"

    session = requests.session()
    with open(f_file,"r") as r_f:
        for i in r_f:
            tmp = i.strip("\n").split(",")
            url = tmp[-1]
            feed_id = tmp[0]
            rs = grequests.request("GET", url,session=session)
            task.append(rs)

    resp = grequests.imap(task, size=30,exception_handler=exception_handler)

    for i in resp:
        if i.status_code ==200:
            print(i.status_code)

the 3031_xiuxiu_coverImage_v1.dat such as
6650058925696645684,http://***8.jpg
6650058925696645684,http://***8.jpg
6650058925696645684,http://***8.jpg
6650058925696645684,http://***8.jpg

my grequest version is 0.4.0 . thanks in advance

@spyoungtech
Copy link
Owner

Thanks for reporting this issue. I'll try to see if I can reproduce this issue and figure out where memory is building up.

If you feel inclined, you can try profiling your own application, for example using memory-profiler which may be able to tell you where building up memory.

Though, memory increase should probably be somewhat expected as you take in response data. I assume you mean it's a very large or unexpected buildup of mem :)

@Usernamezhx
Copy link
Author

Usernamezhx commented Apr 3, 2020

thanks for your reply. my data size is (100000, ).
the RAM build up from 400MB to 5000MB. and still increace.
the profile:

Line #    Mem usage    Increment   Line Contents
================================================
    10   49.098 MiB   49.098 MiB   @profile
    11                             def test():
    12   49.098 MiB    0.000 MiB       with open(f_file,"r") as r_f:
    13  425.777 MiB    0.258 MiB           for i in r_f:
    14  425.777 MiB    0.258 MiB               tmp = i.strip("\n").split(",")
    15  425.777 MiB    0.258 MiB               url = tmp[-1]
    16  425.777 MiB    0.223 MiB               feed_id = tmp[0]
    17  425.777 MiB    0.258 MiB               rs = grequests.request("GET", url,session=session)
    18  425.777 MiB    0.773 MiB               task.append(rs)
    19                             
    20  425.777 MiB    0.000 MiB       resp = grequests.imap(task, size=30,exception_handler=exception_handler)
    21                             
    22 3647.770 MiB    5.512 MiB       for i in resp:
    23 3647.758 MiB    0.227 MiB           if i.status_code ==200:
    24 3647.758 MiB    0.184 MiB               print(i.status_code)

@spyoungtech
Copy link
Owner

spyoungtech commented Apr 3, 2020

So, I'm getting closer to figuring out what is going on. Here's a few things I've discovered thus far...

grequests opening (and not closing) a new session each request prevents freeing of memory

Take the following code:

@profile
def test():
    url = "https://httpbin.org/status/200"
    reqs=[grequests.get(url) for _ in range(100)]
    responses = grequests.imap(reqs, size=5)
    for resp in responses:
        ...
    print('ok')  # memory should be freed by now

Notice that the memory builds up (104MiB) and is never really released, despite no (apparent) references existing anymore. The size will also get bigger if I increase the number of requests.

Line #    Mem usage    Increment   Line Contents
================================================
     5   35.977 MiB   35.977 MiB   @profile
     6                             def test():
     7   35.977 MiB    0.000 MiB       url = "https://httpbin.org/status/200"
     8   36.477 MiB    0.062 MiB       reqs=[grequests.get(url) for _ in range(100)]
     9   36.477 MiB    0.000 MiB       responses = grequests.imap(reqs, size=10)
    10  104.605 MiB  104.605 MiB       for resp in responses:
    11  104.605 MiB    0.000 MiB           ...
    12  104.613 MiB    0.008 MiB       print('ok') # memory should be freed by now

But if I modify the function to use a requests.Session object for its session...

sesh = requests.Session()
@profile
def test():
    url = "https://httpbin.org/status/200"
    reqs=[grequests.get(url, session=sesh) for _ in range(500)]
    responses = grequests.imap(reqs, size=5)
    for resp in responses:
        ...
    print('ok')  # memory should be freed by now

With this change, there is not nearly as much buildup in memory. (the amount is partially dependent on the pool size used; bigger pool will buildup more memory).
Also, now that we're using a session, increasing the number of requests does not increase the amount of memory built up, either. It is the same for 100 or 500 requests.

Line #    Mem usage    Increment   Line Contents
================================================
     5   36.090 MiB   36.090 MiB   @profile
     6                             def test():
     7   36.090 MiB    0.000 MiB       url = "https://httpbin.org/status/200"
     8   36.090 MiB    0.000 MiB       reqs=[grequests.get(url, session=sesh) for _ in range(500)]
     9   36.090 MiB    0.000 MiB       responses = grequests.imap(reqs, size=5)
    10   42.051 MiB   42.051 MiB       for resp in responses:
    11   42.051 MiB    0.000 MiB           ...
    12   42.059 MiB    0.008 MiB       print('ok')  # memory should be freed by now

Memory not freed due to references in request list

Using the very first code example and profiling from the previous section, (which does not use a session) another issue with freeing memory is seen

    10  104.605 MiB  104.605 MiB       for resp in responses:
    11  104.605 MiB    0.000 MiB           ...
    12  104.613 MiB    0.008 MiB       print('ok') # memory should be freed by now

By the time print('ok') runs, the generator has been exhausted and it SHOULD have freed up memory, but it doesn't. This is because the request list is still holding onto references, preventing garbage collection.

adding del reqs allows the memory to be freed once the generator is exhausted.

@profile
def test():
    url = "https://httpbin.org/status/200"
    reqs=[grequests.get(url) for _ in range(100)]
    responses = grequests.imap(reqs, size=5)
    del reqs
    for resp in responses:
        ...
    print('ok')  # memory should be freed by now

With the references from the request list removed, memory is now freed (more) properly.

Line #    Mem usage    Increment   Line Contents
================================================
     5   35.977 MiB   35.977 MiB   @profile
     6                             def test():
     7   35.977 MiB    0.000 MiB       url = "https://httpbin.org/status/200"
     8   36.477 MiB    0.062 MiB       reqs=[grequests.get(url) for _ in range(100)]
     9   36.477 MiB    0.000 MiB       responses = grequests.imap(reqs, size=5)
    10   36.477 MiB    0.000 MiB       del reqs
    11  104.176 MiB  104.176 MiB       for resp in responses:
    12  104.176 MiB    0.004 MiB           ...
    13   56.660 MiB    0.000 MiB       print('ok')  # memory should be freed by now

A yet remaining problem...

    13   56.660 MiB    0.000 MiB       print('ok')  # memory should be freed by now

Notice that, while we are freeing some memory, not everything is freed up. Specifically we have 56 MiB at the end of this function, but it should be closer to the ~36 MiB we started with. This number increases with the number of requests. (with 500 requests, ~86 MiB will be left).

Since you're already using a session, I think whatever is holding on to this little bit of memory that's building up is causing your memory leak. I'm still working on figuring out exactly what that is!

@spyoungtech
Copy link
Owner

I have a partial fix for the initial issues in #138 -- but I don't think that will help your situation. Still working on it!

@Usernamezhx
Copy link
Author

Usernamezhx commented Apr 10, 2020

I have no idea about it. when I del the request list . RAM can build up slowly. but still increase.

def main():
    r = (grequests.get(item[1]) for item in feed_id_url)  # item[1] means url

    for idx, i in enumerate(grequests.imap( r, size=30)):
        print(idx)
        if i.status_code == 200:
            try:
                img = Image.open(BytesIO(i.content))

            except Exception as e:
                print(e)
                continue

that can work. the RAM stop increasing. but when it run about idx == 1000000 it will stop. I mean ps -aux | grep python it still exist. but it stop at the ids == 1000000. the try except get nothing. it is very strange.

@spyoungtech
Copy link
Owner

That does sound strange. Unfortunately, I have no idea why it would stop suddenly. I've tested locally with as many or more requests, and it never flat out stops.

I have run into similar strange issues in the past though. Perhaps considering updating/changing the version of gevent and/or the version of Python you're using and see if that changes anything. That's really just a guess, though.

@somurzakov
Copy link

somurzakov commented Sep 18, 2024

@Usernamezhx
might be too late of response, but can you try to manually release the HTTP connection pool after processing each response?

for idx, i in enumerate(grequests.imap( r, size=30)):
    if i.status_code == 200:
        i.raw._pool.close()

the reason you are hitting the limit at precisely 1,000,000 is due to ulimit -n - an upper limit of how many open file descriptors you are allowed by the operating system.

Since you are not releasing the TCP pool and TCP socket handle, the connection is left in the dangling state.
if you manually close it with response.raw._pool.close() it will release the TCP socket back to OS and you can hit as many connections as you want.

you can see how many open sockets you have by running lsof -i | grep python | grep https

also in your very first example, you have a variable task - that contains a list of all AsyncRequests.
Keep in mind that each AsyncRequest contains reference to the Response object, and each Response contains reference to the raw response, and HTTPS Connection Pool, and each pool will have TCP handle.

So as long as you carry around a list of AsyncRequests, the referenced Responses, and underlying TCP pools and connections will never be closed, garbage collected and returned to the operating system

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants