Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insitudiff enhancements #141

Open
9 tasks
patrickjdanaher opened this issue Dec 3, 2024 · 3 comments
Open
9 tasks

insitudiff enhancements #141

patrickjdanaher opened this issue Dec 3, 2024 · 3 comments

Comments

@patrickjdanaher
Copy link
Collaborator

patrickjdanaher commented Dec 3, 2024

  • better colors in auto-coloring functions
  • initializeISD should expose subsample size as an argument
  • harder: can we map each cell to 10-20 control neighborhoods, then record the minimal difference per gene? (to avoid nearly-identical neighborboods of neighboring cells, should gerrymander controls into ~10 regions, then find closest within each region)
  • default to "diff" residuals, update vignette accordingly?
  • "diff" residuals should scale genes. Scaling by inverse sqrt looked good in wnv data.
  • clustering modules: allow for single-gene modules, and allow getPerturbations to process single-gene modules
  • cell type attribution (see below)
  • control-free analysis (see below)
  • improving quality: quickly identify hpgs, then remove from the control-matching exercise
@patrickjdanaher
Copy link
Collaborator Author

Implement cell type attribution. Excerpted code:

  # cell type specific perturbation for the gene:
  perturbbycell <- matrix(NA, length(usecells), length(usecelltypes),
                          dimnames = list(usecells, usecelltypes))
  for (cell in usecelltypes) {
    tempnorm <- norm[, gene, drop = F] * (clust == cell)
    temp <- getPerturbations(tempnorm, 
                             cells = usecells,
                             obj, 
                             eps = 0.1, 
                             residtype = "diff")
    perturbbycell[, cell] <- temp
  }

Plotting: total cell abundance vs. total cell perturbation:

par(mar = c(5,5.5,2,0.5))
  plot(as.vector(table(clust)[colnames(perturbbycell)]), colSums(perturbbycell), col = 0,
       xlab = "Abundance", ylab = "Total perturbation", xlim = c(0, 20300), cex.lab = 1.5)
  text(as.vector(table(clust)[colnames(perturbbycell)]), colSums(perturbbycell), colnames(perturbbycell), cex = 0.7)

@patrickjdanaher
Copy link
Collaborator Author

Concept: control-free analysis:

Each cellular neighborhood (CN) could be mapped to the most similar CNs anywhere else in the tissue, excluding a small radius around itself. Then we could ID perturbations.

Probably we'd only want to look at strong positive perturbations. And probably we'd want to use the approach (mentioned earlier) in which you map to 10-20 similar CNs and score each gene's minimal perturbation from them.

Or, you could look at perturbations vs. all other tissues.

@patrickjdanaher
Copy link
Collaborator Author

InSituDiff objects can get huge: 500 Mb in one case of 600k cells and 100 neighbors.

Solutions:

  • Encourage fewer neighbors
  • Have an option to not save it, but rather to calculate it again on the fly? (Maaaybe. For operations on a subset of cells, this could make sense - you'd just get the KNN for the 5000 cells you're operating on. But for operations over all cells, you'd just have to re-run KNN for all cells. Probably better to store the result.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant