Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement bootstrapCluster for clusters generated in Seurat #75

Open
lucygarner opened this issue Oct 26, 2020 · 6 comments
Open

Implement bootstrapCluster for clusters generated in Seurat #75

lucygarner opened this issue Oct 26, 2020 · 6 comments

Comments

@lucygarner
Copy link

Hi,

I would like to establish the stability of the clusters that I have identified using Seurat (FindNeighbors and FindClusters). I was wondering how difficult it would be for me to reimplement the bootstrapCluster approach for this? Would the best way for this be to just write out a function that performs Seurat-like clustering on log normalised counts and then input this into bootstrapCluster? Would I include the PCA function in there as well?

Many thanks,
Lucy

@LTLA
Copy link
Collaborator

LTLA commented Oct 26, 2020

If you write a function that accepts a log-normalized expression matrix (or a PC matrix, if transposed=TRUE), then you should just be able to give it to FUN and the rest will "just work". Assuming that your function doesn't have bugs, that is.

Note that scran::bootstrapCluster is moving to bluster::bootstrapStability in the upcoming release.

@lucygarner
Copy link
Author

lucygarner commented Oct 27, 2020

Thank you, I have written the following to extract the PC matrix from my Seurat object (named object) and run clustering. Does this seem reasonable?

data.use <- Embeddings(object = object[["pca"]])
data.use <- data.use[, 1:30]    # number of PCs to use

seurat_clustering <- function(PC_matrix) {
    neighbor.graphs <- FindNeighbors(object = PC_matrix,
                                     k.param = 20,
                                     compute.SNN = TRUE,
                                     prune.SNN = 1/15,
                                     nn.method = "rann",
                                     annoy.metric = "euclidean",
                                     nn.eps = 0,
                                     verbose = TRUE,
                                     force.recalc = FALSE)

    clustering.results <- FindClusters(object = neighbor.graphs$snn,
                                       modularity.fxn = 1,
                                       initial.membership = NULL,
                                       node.sizes = NULL,
                                       resolution = 0.5,
                                       method = "matrix",
                                       algorithm = 1,
                                       n.start = 10,
                                       n.iter = 10,
                                       random.seed = 0,
                                       group.singletons = TRUE,
                                       temp.file.location = NULL,
                                       edge.file.name = NULL,
                                       verbose = TRUE)
    
    return(clustering.results$res.0.5)
}

bootstrap_clusters <-  bootstrapCluster(data.use, FUN = seurat_clustering, transposed = TRUE)

How is bluster::bootstrapStability technically different to scran::bootstrapCluster and would you recommend switching over straight away?

Many thanks,
Lucy

@LTLA
Copy link
Collaborator

LTLA commented Oct 27, 2020

Seems fine to me.

As for the difference: there is none, other than the name change and movement to a new package. Just moved some functions out of scran because the package was getting a bit too cluttered. The new package and function should be available when BioC 3.12 comes out tomorrow.

@lucygarner
Copy link
Author

lucygarner commented Oct 28, 2020

Great thanks. I tried the following, however I am getting strange results. I'm slightly confused as it seemed to be working yesterday!

bootstrap_clusters <- bluster::bootstrapStability(x = data.use, FUN = seurat_clustering, transposed = TRUE)

data.use is a matrix with cell IDs as row names and PCs and column names.

image

image

Any thoughts as to what might be happening here?

Best wishes,
Lucy

@LTLA
Copy link
Collaborator

LTLA commented Nov 19, 2020

Sorry, I just noticed this. Looks like you didn't set a seed, so that's probably why you got different results. I'm not quite sure why you got NA's for some of the cluster pairs. It would seem like pairwiseRand() is at fault but I can't see how that could happen.

I should add that your matrix already looks pretty strange. It is pretty surprising to have low diagonal entries along with high off-diagonal entries. If you can give me a reproducible example, I can try to debug it.

@LTLA
Copy link
Collaborator

LTLA commented Nov 19, 2020

I think I got it:

pairwiseRand(factor(sample(3, 100, replace=TRUE), levels=1:8), sample(3, 100, replace=TRUE))
##             1          2            3   4   5   6   7   8
## 1 0.005751586 0.03516088  0.002100879 NaN NaN NaN NaN NaN
## 2          NA 0.01338677 -0.037901716 NaN NaN NaN NaN NaN
## 3          NA         NA -0.018066332 NaN NaN NaN NaN NaN
## 4          NA         NA           NA NaN NaN NaN NaN NaN
## 5          NA         NA           NA  NA NaN NaN NaN NaN
## 6          NA         NA           NA  NA  NA NaN NaN NaN
## 7          NA         NA           NA  NA  NA  NA NaN NaN
## 8          NA         NA           NA  NA  NA  NA  NA NaN

Your clustering function may be returning unused levels - see how 11 is a level but is not used in the factor - causing the final matrix to have NAs for all affected levels. Well, technically they're NaNs, I don't know if your screenshot's matrix view distinguishes between the two. I can't think of any other way of generating pure NAs - looking at the code, it shouldn't even be possible.

If this indeed the case, I'm guessing that Seurat is being a bit too smart for its own good when creating the cluster assignment. pairwiseRand() will simply respect these unused levels, even though they are rather silly in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants