Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backed="r" is leaking memory #1597

Open
2 of 3 tasks
agemagician opened this issue Aug 14, 2024 · 2 comments
Open
2 of 3 tasks

backed="r" is leaking memory #1597

agemagician opened this issue Aug 14, 2024 · 2 comments

Comments

@agemagician
Copy link

agemagician commented Aug 14, 2024

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the master branch of anndata.

Report

Hello,

Using the lazy loader, I am trying to load a big h5ad file that can't fit into memory. It works as expected and doesn't load the whole file into memory at the start.
However, if I started to read the data row by row, the memory utilization increased to the same file size.

My expectation is that the lazy loader should read each row and free its associated memory after it is deleted, which is not the case.

How can we solve this issue?

Code:

import scanpy as sc
import sys
from tqdm import tqdm

file_name = "file.h5ad"

adata_r = sc.read_h5ad(file_name, backed="r")

for idx in tqdm(range(adata_r.X.shape[0])):
    row = adata_r.X[idx].toarray()
    del row

100%|▌| 800000/800000 [20:00<20:00, 700.60it/s]

Versions

-----
anndata             0.10.8
scanpy              1.9.8
session_info        1.0.0
tqdm                4.66.5
-----
IPython             8.21.0
jupyter_client      8.6.0
jupyter_core        5.7.1
jupyterlab          2.3.2
notebook            6.4.10
-----
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]
Linux-5.10.219-208.866.amzn2.x86_64-x86_64-with-glibc2.35
-----
Session information updated at 2024-08-14 12:33
@ilan-gold
Copy link
Contributor

ilan-gold commented Aug 14, 2024

Hello @agemagician, we will need a little bit more to go on here. Your example did not produce the same results for me on a large dataset I had locally. Could you share your dataset? Could it be CSC?

Copy link

This issue has been automatically marked as stale because it has not had recent activity.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

@github-actions github-actions bot added the stale label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants