memory #13

ShobiStassen · 2019-02-28T04:03:53Z

Hi,
I am trying to run the full 1.3M 10X mouse cell dataset (using the 1M_neurons_filtered_gene_bc_matrices_h5.h5 file from 10X website).
I have 126GB RAM and Intel® Xeon(R) W-2123 CPU @ 3.60GHz × 8 which is above the requirements you mention needed to run the full cluster.py method without subsampling.
I get memory error at the filter_genes_dispersion stage, should i modify the code in anyway? (without subsampling)
Thanks,Shobi

adata = sc.read_10x_h5(filename)
adata.var_names_make_unique()
sc.pp.recipe_zheng17(adata)

running recipe zheng17
filtered out 3983 genes that are detected in less than 1 counts
Traceback (most recent call last):
File "/home/shobi/PycharmProjects/my_first_conda_project/10X_mousebrain.py", line 61, in
main()
File "/home/shobi/PycharmProjects/my_first_conda_project/10X_mousebrain.py", line 58, in main
basic_analysis(DIR+'1M_neurons_filtered_gene_bc_matrices_h5.h5')
File "/home/shobi/PycharmProjects/my_first_conda_project/10X_mousebrain.py", line 24, in basic_analysis
sc.pp.recipe_zheng17(adata)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scanpy/preprocessing/_recipes.py", line 108, in recipe_zheng17
adata.X, flavor='cell_ranger', n_top_genes=n_top_genes, log=False)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scanpy/preprocessing/_deprecated/highly_variable_genes.py", line 109, in filter_genes_dispersion
mean, var = materialize_as_ndarray(_get_mean_var(X))
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scanpy/preprocessing/_utils.py", line 10, in _get_mean_var
mean = X.mean(axis=0)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/base.py", line 1077, in mean
inter_self = self.astype(inter_dtype)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/data.py", line 74, in astype
return self.copy()
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/data.py", line 91, in copy
return self._with_data(self.data.copy(), copy=True)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/compressed.py", line 1124, in _with_data
return self.class((data,self.indices.copy(),self.indptr.copy()),
MemoryError

ShobiStassen · 2019-02-28T05:42:13Z

I also wanted to add that the initial filtering and normalization steps in recipe_Zhang17() already used around 70GB RAM - is this expected (the readme says around 30GB should be sufficient)?
adata = sc.read_10x_h5(filename)
sc.pp.filter_genes(adata, min_counts=1)
sc.pp.normalize_per_cell( adata, key_n_counts='n_counts_all')

davisidarta · 2020-08-13T18:06:26Z

I also have replicated these findings on a 128GB RAM six-core Xeon P52 workstation and on a HPCC. Baseline memory usage is around 30GB, peaks at ~140GB during PCA and scaling, and takes around ~60GB for other computations. These results were identical regardless of being computed in the workstation or the HPCC.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory #13

memory #13

ShobiStassen commented Feb 28, 2019

ShobiStassen commented Feb 28, 2019

davisidarta commented Aug 13, 2020

memory #13

memory #13

Comments

ShobiStassen commented Feb 28, 2019

ShobiStassen commented Feb 28, 2019

davisidarta commented Aug 13, 2020