Skip to content
This repository has been archived by the owner on Mar 9, 2023. It is now read-only.

memory #13

Open
ShobiStassen opened this issue Feb 28, 2019 · 2 comments
Open

memory #13

ShobiStassen opened this issue Feb 28, 2019 · 2 comments

Comments

@ShobiStassen
Copy link

Hi,
I am trying to run the full 1.3M 10X mouse cell dataset (using the 1M_neurons_filtered_gene_bc_matrices_h5.h5 file from 10X website).
I have 126GB RAM and Intel® Xeon(R) W-2123 CPU @ 3.60GHz × 8 which is above the requirements you mention needed to run the full cluster.py method without subsampling.
I get memory error at the filter_genes_dispersion stage, should i modify the code in anyway? (without subsampling)
Thanks,Shobi

adata = sc.read_10x_h5(filename)
adata.var_names_make_unique()
sc.pp.recipe_zheng17(adata)

running recipe zheng17
filtered out 3983 genes that are detected in less than 1 counts
Traceback (most recent call last):
File "/home/shobi/PycharmProjects/my_first_conda_project/10X_mousebrain.py", line 61, in
main()
File "/home/shobi/PycharmProjects/my_first_conda_project/10X_mousebrain.py", line 58, in main
basic_analysis(DIR+'1M_neurons_filtered_gene_bc_matrices_h5.h5')
File "/home/shobi/PycharmProjects/my_first_conda_project/10X_mousebrain.py", line 24, in basic_analysis
sc.pp.recipe_zheng17(adata)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scanpy/preprocessing/_recipes.py", line 108, in recipe_zheng17
adata.X, flavor='cell_ranger', n_top_genes=n_top_genes, log=False)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scanpy/preprocessing/_deprecated/highly_variable_genes.py", line 109, in filter_genes_dispersion
mean, var = materialize_as_ndarray(_get_mean_var(X))
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scanpy/preprocessing/_utils.py", line 10, in _get_mean_var
mean = X.mean(axis=0)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/base.py", line 1077, in mean
inter_self = self.astype(inter_dtype)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/data.py", line 74, in astype
return self.copy()
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/data.py", line 91, in copy
return self._with_data(self.data.copy(), copy=True)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/compressed.py", line 1124, in _with_data
return self.class((data,self.indices.copy(),self.indptr.copy()),
MemoryError

@ShobiStassen
Copy link
Author

I also wanted to add that the initial filtering and normalization steps in recipe_Zhang17() already used around 70GB RAM - is this expected (the readme says around 30GB should be sufficient)?
adata = sc.read_10x_h5(filename)
sc.pp.filter_genes(adata, min_counts=1)
sc.pp.normalize_per_cell( adata, key_n_counts='n_counts_all')

@davisidarta
Copy link

I also have replicated these findings on a 128GB RAM six-core Xeon P52 workstation and on a HPCC. Baseline memory usage is around 30GB, peaks at ~140GB during PCA and scaling, and takes around ~60GB for other computations. These results were identical regardless of being computed in the workstation or the HPCC.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants