Capacity Error with `readXenium()` Function and Issues Combining Samples Using `cbind()` #48

aasingh2 · 2024-08-05T14:10:06Z

Hello,

Thank you for the package! I am encountering a couple of issues and would appreciate your guidance.

Issue 1: Capacity Error with `readXenium()`

I am working with multiple Xenium samples and can successfully read most of them using the readXenium() function. However, I receive the following error message for a few samples:
Error: Capacity error: array cannot contain more than 2147483646 bytes, have 2157274215.

It seems that this error is related to the Arrow package used to read 'parquet' files. Is there a way to resolve this issue or a workaround that you would recommend?

Issue 2: Combining Samples with `cbind()`

I intended to use cbind() to combine multiple samples into a single SpatialFeatureExperiment. Unfortunately, I encountered an error because my samples have different numbers of rows (e.g., differing number of control probes, antisense probes, etc.). The error is as follows:

Error in FUN(X[[i]], ...) : column(s) 'ID' in ‘mcols’ are duplicated and the data do not match

The error disappears when I use cbind() after subsetting only the genes on the Xenium gene panel (i.e., excluding the negative control probes, antisense, etc.). Is there a way to combine the samples without needing to subset them first?

Thank you for the help!

The text was updated successfully, but these errors were encountered:

alikhuseynov · 2024-08-05T16:00:36Z

Hi, could you please include traceback() when there is an error.

Issue 1:

yes, it is arrow related but I never saw this error when loading Xenium data, what version of XOA is this data from? I think we will need some time to tackle that.

Issue 2:

cbind() would work only if the genes are the same in all samples. We don't support full join like merge() yet but check this issue:
Merge method for SFE #29
also using genes present in some sample but not in others would bias the downstream analysis in any case.
If you want to keep all background probes, probably renaming them to same names across the all samples would work.

lambdamoses · 2024-08-05T16:40:45Z

Actually I have encountered the arrow error before. I haven't implemented it yet but I can try modifying the code to split the transcript spots and write them to multiple smaller GeoParquet files and then using DuckDB to concatenate the files.

lambdamoses · 2024-08-05T16:42:00Z

Seurat style full join is very problematic in that genes present in sample 1 but not sample 2 are NA's in sample 2, but Seurat fills in 0, which is not an appropriate stand in for NA.

aasingh2 · 2024-08-06T08:43:18Z

Hi @alikhuseynov and @lambdamoses ,

Thank you for the quick reply. Regarding the issue with joining samples, I am now planning to perform quality control (QC) prior to merging and remove the remaining probes from the gene panel. I believe this will allow cbind() to work correctly.

We are using XOA version 3.0.0.15. Below is the traceback of the error encountered in Issue 1:

traceback()
9: Table__from_dots(dots, schema, option_use_threads())
8: arrow::Table$create(df)
7: sfarrow::st_write_parquet(mols, file_out)
6: withCallingHandlers(expr, warning = function(w) if (inherits(w,
classes)) tryInvokeRestart("muffleWarning"))
5: suppressWarnings(sfarrow::st_write_parquet(mols, file_out))
4: formatTxSpots(file, dest = dest, spatialCoordsNames = spatialCoordsNames,
gene_col = gene_col, z = z, phred_col = phred_col, min_phred = min_phred,
split_col = split_col, flip = flip, z_option = z_option,
file_out = file_out, BPPARAM = BPPARAM, return = TRUE)
3: addTxSpots(sfe, file = fn, sample_id = sample_id, spatialCoordsNames = spatialCoordsNames,
gene_col = gene_col, z = z, phred_col = "qv", min_phred = min_phred,
split_col = split_col, z_option = z_option, flip = flip,
file_out = file_out, BPPARAM = BPPARAM)
2: addTxTech(sfe, data_dir, sample_id, tech = "Xenium", min_phred = min_phred,
BPPARAM = BPPARAM, flip = (flip == "geometry"), file_out = file_out)
1: readXenium(data_dir = "./output-XETG00291__0018868__1802-2017__20240726__093125",
sample_id = "1802_2017", image = "morphology_focus", segmentations = c("cell",
"nucleus"), add_molecules = TRUE, file_out = "Xe_1802_2017")

I had the error for 5 out of the 13 samples that I tried to read in.

alikhuseynov · 2024-08-06T12:20:48Z

Thanks, so it is multimodal Xenium data and probably those 5 samples have very large transcript (tx) files.
Error happens when writing tx file, we will add support for large tx file split. Until then, If you don't need transcripts coords in your analysis, you can set add_molecules = FALSE, or file_out = NULL (which would read but not write processed transcript data).
You can do QC per sample and subset stuff before combining, but again, cbind() would only work if all samples have the same number of features with same names.

lambdamoses · 2024-09-20T21:23:28Z

We might not need to use DuckDB after all. I just found out that the sfarrow package can partition a large sf object before writing to GeoParquet: https://wcjochem.github.io/sfarrow/reference/write_sf_dataset.html

alikhuseynov · 2024-09-23T09:53:24Z

that's great! splitting using partitioning would be the easiest way I think.
https://wcjochem.github.io/sfarrow/articles/example_sfarrow.html#partitioned-datasets-1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capacity Error with `readXenium()` Function and Issues Combining Samples Using `cbind()` #48

Capacity Error with `readXenium()` Function and Issues Combining Samples Using `cbind()` #48

aasingh2 commented Aug 5, 2024

alikhuseynov commented Aug 5, 2024

lambdamoses commented Aug 5, 2024

lambdamoses commented Aug 5, 2024

aasingh2 commented Aug 6, 2024 •

edited

Loading

alikhuseynov commented Aug 6, 2024 •

edited

Loading

lambdamoses commented Sep 20, 2024

alikhuseynov commented Sep 23, 2024 •

edited

Loading

Capacity Error with readXenium() Function and Issues Combining Samples Using cbind() #48

Capacity Error with readXenium() Function and Issues Combining Samples Using cbind() #48

Comments

aasingh2 commented Aug 5, 2024

Issue 1: Capacity Error with readXenium()

Issue 2: Combining Samples with cbind()

alikhuseynov commented Aug 5, 2024

Issue 1:

Issue 2:

lambdamoses commented Aug 5, 2024

lambdamoses commented Aug 5, 2024

aasingh2 commented Aug 6, 2024 • edited Loading

alikhuseynov commented Aug 6, 2024 • edited Loading

lambdamoses commented Sep 20, 2024

alikhuseynov commented Sep 23, 2024 • edited Loading

Capacity Error with `readXenium()` Function and Issues Combining Samples Using `cbind()` #48

Capacity Error with `readXenium()` Function and Issues Combining Samples Using `cbind()` #48

Issue 1: Capacity Error with `readXenium()`

Issue 2: Combining Samples with `cbind()`

aasingh2 commented Aug 6, 2024 •

edited

Loading

alikhuseynov commented Aug 6, 2024 •

edited

Loading

alikhuseynov commented Sep 23, 2024 •

edited

Loading