-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1015703: Memory Leak when running queries with many PARTITION clauses #1859
Comments
@willgraf So the problem you observed is python connector used more memory when the sql had |
Yes, the other SQL that uses <1GB of RAM is I was also surprised to see this behavior as I would have thought the memory usage would be independent of the query as the server runs the query. |
it might be related to our nanoarrow c extension change -- starting from v3.3.0 we switched to use nanoarrow as our underlying data converter. I will take a closer look at it. |
I think I have found the leaking code -- it's inside the decimal data extraction in the cython extension module, I have a PR to fix it: #1867 thanks for the report! |
hey @willgraf, we have released v3.7.1 which includes the fix. please try it out and let me know if it works. |
Python version
Python 3.9.18 (main, Aug 25 2023, 13:20:04) [GCC 9.4.0]
Operating system and processor architecture
Linux-5.10.199-190.747.amzn2.x86_64-x86_64-with-glibc2.31
Installed packages
What did you do?
Summary
This service uses
snowflake-connector-python
in a Celery worker to execute a snowflake query and extract results to an S3 bucket. Simple queries (SELECT * FROM table
) behave as expected. However, queries with severalPARTITION
clauses cause a memory leak and the process is OOM Killed.Steps to reproduce
I generated a mock dataset with 10M rows via a Python worksheet
Then I run a query and repeatedly
fetchmany
until all results have been fetched:Versioning
I found this behavior was introduced in version 3.3.0 and exists up through 3.6.0. v3.2.1 seems to work as expected.
What did you expect to see?
I expected to see memory footprint the same as the
SELECT *
and did NOT expect the query to change the memory footprint of the container using thesnowflake-connector-python
lib.The PARTITION query uses up nearly 8GB of RAM and the vanilla SELECT query uses <1GB.
Can you set logging to DEBUG and collect the logs?
No response
The text was updated successfully, but these errors were encountered: