insitudiff enhancements #141

patrickjdanaher · 2024-12-03T23:13:05Z

better colors in auto-coloring functions
initializeISD should expose subsample size as an argument
harder: can we map each cell to 10-20 control neighborhoods, then record the minimal difference per gene? (to avoid nearly-identical neighborboods of neighboring cells, should gerrymander controls into ~10 regions, then find closest within each region)
default to "diff" residuals, update vignette accordingly?
"diff" residuals should scale genes. Scaling by inverse sqrt looked good in wnv data.
clustering modules: allow for single-gene modules, and allow getPerturbations to process single-gene modules
cell type attribution (see below)
control-free analysis (see below)
improving quality: quickly identify hpgs, then remove from the control-matching exercise

patrickjdanaher · 2024-12-06T16:34:49Z

Implement cell type attribution. Excerpted code:

  # cell type specific perturbation for the gene:
  perturbbycell <- matrix(NA, length(usecells), length(usecelltypes),
                          dimnames = list(usecells, usecelltypes))
  for (cell in usecelltypes) {
    tempnorm <- norm[, gene, drop = F] * (clust == cell)
    temp <- getPerturbations(tempnorm, 
                             cells = usecells,
                             obj, 
                             eps = 0.1, 
                             residtype = "diff")
    perturbbycell[, cell] <- temp
  }

Plotting: total cell abundance vs. total cell perturbation:

par(mar = c(5,5.5,2,0.5))
  plot(as.vector(table(clust)[colnames(perturbbycell)]), colSums(perturbbycell), col = 0,
       xlab = "Abundance", ylab = "Total perturbation", xlim = c(0, 20300), cex.lab = 1.5)
  text(as.vector(table(clust)[colnames(perturbbycell)]), colSums(perturbbycell), colnames(perturbbycell), cex = 0.7)

patrickjdanaher · 2024-12-10T19:10:39Z

Concept: control-free analysis:

Each cellular neighborhood (CN) could be mapped to the most similar CNs anywhere else in the tissue, excluding a small radius around itself. Then we could ID perturbations.

Probably we'd only want to look at strong positive perturbations. And probably we'd want to use the approach (mentioned earlier) in which you map to 10-20 similar CNs and score each gene's minimal perturbation from them.

Or, you could look at perturbations vs. all other tissues.

patrickjdanaher · 2025-01-21T18:24:24Z

InSituDiff objects can get huge: 500 Mb in one case of 600k cells and 100 neighbors.

Solutions:

Encourage fewer neighbors
Have an option to not save it, but rather to calculate it again on the fly? (Maaaybe. For operations on a subset of cells, this could make sense - you'd just get the KNN for the 5000 cells you're operating on. But for operations over all cells, you'd just have to re-run KNN for all cells. Probably better to store the result.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

insitudiff enhancements #141

insitudiff enhancements #141

patrickjdanaher commented Dec 3, 2024 •

edited

Loading

patrickjdanaher commented Dec 6, 2024

patrickjdanaher commented Dec 10, 2024

patrickjdanaher commented Jan 21, 2025

insitudiff enhancements #141

insitudiff enhancements #141

Comments

patrickjdanaher commented Dec 3, 2024 • edited Loading

patrickjdanaher commented Dec 6, 2024

patrickjdanaher commented Dec 10, 2024

Concept: control-free analysis:

patrickjdanaher commented Jan 21, 2025

patrickjdanaher commented Dec 3, 2024 •

edited

Loading