-
Notifications
You must be signed in to change notification settings - Fork 60
can't replicate 170505_seurat/seurat.ipynb notebook #6
Comments
Hi Fidel,
I'm rerunning this notebook all the time, of course. And the results are consistent with the ones that are uploaded and run with Scanpy 1.1. Since you came up with the idea of comparing figures, I wanted to write a test for each notebook based on the images. But I still haven't automatized it. Does it make sense? |
Thanks for the reply. I like the idea of having tests based on notebooks! There are many functions not currently tested and that would easily add may more tests. However, for the notebook that I tried to replicate the test would had failed for reasons that are not straightforward to identify. But beyond that, certainly most of the images will fail automatic tests. For the automatic plotting tests of scanpy I had to save the images without any layout enhancement, otherwise the tests fail. Thus, the resulting test images have labels that are cut, making them only useful for tests but not for anything else. My suggestion would be to add, as part of any PR an automatic message that asks the developer to run the notebooks and manually check that they are ok before submitting the PR. |
About the I have noticed that the In contrast, in the original notebook the |
OK, let me check this again. Indeed, there was a pull request for the logreg method a week ago and I just noticed that we only covered t-test and wilcoxon-rank with tests, as for the logreg method, we simply use scikit-learn. I'll add a test...
But maybe this is something more general: it depends on what your notion of "good" is. I guess you mean: which gene taken as a single predictor gives me the best discriminative (predictive) power for identifying a cluster. Then, the logreg method will fail completely as it's a multivariate method. If you ask for sets of genes that together give you the best predictive power in a linear model, then logreg provides the answer. There are cases where this is meaningful. |
Everything is fine with the current state of master of Scanpy. It perfectly recovers the standard clustering tutorial. Let me know create some tests for it. |
Way too late, the test for the notebook: https://github.com/theislab/scanpy/tree/master/scanpy/tests/notebooks Your non-reproducibility issues are all due to the PCA. Calling the PCA with |
I tried to replicate the .. notebook using scanpy but I got some different results (see notebook here):
sc.tl.rank_genes_groups(adata, 'louvain', method='logreg')
seem quite different compared to the results from the default method (which are similar to the original notebook for some groups). For example, for louvain cluster '0', the top ranking genes in the original notebook are LDHB and CD3D. I see these two genes using the default ranking method. However, for the 'logreg' method the list of top genes is quite different.Would be possible for you to re-run the notebook to see if you get the same results that I get? Maybe the data that you are using is different than the one I use (I downloaded the pbmc3k data from 10x)?
The text was updated successfully, but these errors were encountered: