High memory usage when using assay() on large RaggedExperiments #25

biobenkj · 2020-02-04T14:14:22Z

RaggedExperiment continues to rule for all our 'omics related work! I did notice something interesting yesterday when running compactSummarizedExperiment(), when I attempt to access the names of the assays in a large RE

# RaggedExperiment in question
> aaml
class: RaggedExperiment
dim: 36019710 1401
assays(2): pc compartments
rownames: NULL
colnames(1401): 813584_Dx 814465_Dx ... RO02776B RO02815
colData names(99): Timepoint Gender ... MLLT10 KMT2A

#size
> object_size(aaml)
1.01 GB

#names access
#high memory usage (100s of GB)
names(assay(aaml))

#names access
#near instant
assayNames(aaml)

it will either be near instantaneous with using assayNames(), or require 100s of GB of memory with names(assay(my_RE)). Do you know why this might be the case? I'll work on getting a smaller reproducible example if there is interest.

Thanks again for all that you do and RaggedExperiments!

The text was updated successfully, but these errors were encountered:

mtmorgan · 2020-02-04T16:28:06Z

I believe, without actually checking, that the names are stored independently of the underlying data representation, and the cost is associated with adding names and hence duplicating the underlying data. If it's 'easy' to simulate the data for a reproducible example that would be great.

LiNk-NY · 2020-02-04T18:06:46Z

Hi Ben, @biobenkj
I'm glad to hear you are making use of this data representation!
The trick behind RaggedExperiment involves providing a matrix representation from a GRangesList object. In the background, the stored representation is a GRangesList so accessing the metadata it relatively straightforward. When using assay, the GRangesList representation has to be converted to matrix, this involves creating quite a large sparse matrix from the mcols in the original GRangesList, a costly operation.
I agree, a minimal and reproducible example would be helpful. We'll see what we can do to increase the efficiency of this conversion. Thank you.

LiNk-NY · 2020-11-12T17:09:46Z

@biobenkj Any updates on this?
Would a dgCMatrix representation help? Have you tested this?
We can create additional functionality to return this data representation.
If you can provide a reproducible example to help this move along, that would be great. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory usage when using assay() on large RaggedExperiments #25

High memory usage when using assay() on large RaggedExperiments #25

biobenkj commented Feb 4, 2020

mtmorgan commented Feb 4, 2020

LiNk-NY commented Feb 4, 2020

LiNk-NY commented Nov 12, 2020 •

edited

Loading

High memory usage when using assay() on large RaggedExperiments #25

High memory usage when using assay() on large RaggedExperiments #25

Comments

biobenkj commented Feb 4, 2020

mtmorgan commented Feb 4, 2020

LiNk-NY commented Feb 4, 2020

LiNk-NY commented Nov 12, 2020 • edited Loading

LiNk-NY commented Nov 12, 2020 •

edited

Loading