Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fgsea never ending #19

Closed
llrs opened this issue Feb 15, 2018 · 10 comments
Closed

fgsea never ending #19

llrs opened this issue Feb 15, 2018 · 10 comments
Labels

Comments

@llrs
Copy link
Contributor

llrs commented Feb 15, 2018

I have some problem for a while, it seems like fgsea never return the results:

system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 10000))
##    user  system elapsed 
##   0.168   0.116   0.517 
system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 20000))
##
## Timing stopped at: 0.28 0.416 731.2
system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 1000))
##    user  system elapsed 
##   0.052   0.000   0.058 
system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 2000))
## 
## Timing stopped at: 0.204 0.32 147.1

So while the 10k permutations takes 0.116 and 0.516 for the system and in total, when I double it it takes much more and I had to stop the function (I had this same function running for 3 days without returning any value), despite being reported that to the user it took less than a second the output wasn't ready after 10 minutes. And a similar thing happened when I used 1k permutation and doubled to 2k.

Sorry I couldn't make it reproducible, I really don't understand what can be triggering this. Here is some information about the data I am testing:

length(comp1)
## 505
length(grouping)
## 127
table(lengths(grouping))
## 
##  1  2  3  4  5  6  7  8  9 11 12 14 21 22 42 
## 61 16 10 13  4  3  3  4  1  4  3  2  1  1  1 
quantile(comp1)
##         0%        25%        50%        75%       100% 
## -0.2198301  0.0000000  0.0000000  0.0000000  0.2639637 

Perhaps the problem is that ~78% of my stats are 0, and so they are equivalent in the position. But when I removed them I also had the same problem.

The session_info("fgsea"):

Session info -----------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.3 (2017-11-30)
 system   i686, linux-gnu             
 ui       RStudio (1.1.423)           
 language en_US                       
 collate  en_US.UTF-8                 
 tz       Europe/Madrid               
 date     2018-02-15                  

Packages ---------------------------------------------------------------------------------------------------------------------------------
 package        * version  date       source         
 assertthat       0.2.0    2017-04-11 CRAN (R 3.4.3) 
 BH               1.65.0-1 2017-08-24 CRAN (R 3.4.2) 
 BiocParallel     1.12.0   2017-11-02 Bioconductor   
 cli              1.0.0    2017-11-05 CRAN (R 3.4.3) 
 colorspace       1.3-2    2016-12-14 CRAN (R 3.4.2) 
 compiler         3.4.3    2017-12-01 local          
 crayon           1.3.4    2017-09-16 CRAN (R 3.4.2) 
 data.table     * 1.10.4-3 2017-10-27 CRAN (R 3.4.3) 
 dichromat        2.0-0    2013-01-24 CRAN (R 3.4.2) 
 digest           0.6.15   2018-01-28 CRAN (R 3.4.3) 
 fastmatch        1.1-0    2017-01-28 CRAN (R 3.4.3) 
 fgsea          * 1.4.1    2018-01-26 Bioconductor   
 futile.logger    1.4.3    2016-07-10 CRAN (R 3.4.2) 
 futile.options   1.0.0    2010-04-06 CRAN (R 3.4.2) 
 ggplot2        * 2.2.1    2016-12-30 CRAN (R 3.4.3) 
 graphics       * 3.4.3    2017-12-01 local          
 grDevices      * 3.4.3    2017-12-01 local          
 grid             3.4.3    2017-12-01 local          
 gridExtra        2.3      2017-09-09 CRAN (R 3.4.2) 
 gtable           0.2.0    2016-02-26 CRAN (R 3.4.2) 
 labeling         0.3      2014-08-23 CRAN (R 3.4.2) 
 lambda.r         1.2      2017-09-16 CRAN (R 3.4.2) 
 lazyeval         0.2.1    2017-10-29 CRAN (R 3.4.2) 
 magrittr         1.5      2014-11-22 CRAN (R 3.4.2) 
 MASS             7.3-48   2017-12-24 CRAN (R 3.4.3) 
 methods        * 3.4.3    2017-12-01 local          
 munsell          0.4.3    2016-02-13 CRAN (R 3.4.2) 
 parallel       * 3.4.3    2017-12-01 local          
 pillar           1.1.0    2018-01-14 CRAN (R 3.4.3) 
 plyr             1.8.4    2016-06-08 CRAN (R 3.4.2) 
 R6               2.2.2    2017-06-17 CRAN (R 3.4.3) 
 RColorBrewer   * 1.1-2    2014-12-07 CRAN (R 3.4.2) 
 Rcpp           * 0.12.15  2018-01-20 cran (@0.12.15)
 reshape2       * 1.4.3    2017-12-11 CRAN (R 3.4.3) 
 rlang            0.1.6    2017-12-21 CRAN (R 3.4.3) 
 scales           0.5.0    2017-08-24 CRAN (R 3.4.2) 
 snow             0.4-2    2016-10-14 CRAN (R 3.4.2) 
 stats          * 3.4.3    2017-12-01 local          
 stringi          1.1.6    2017-11-17 CRAN (R 3.4.2) 
 stringr          1.2.0    2017-02-18 CRAN (R 3.4.3) 
 tibble           1.4.2    2018-01-22 CRAN (R 3.4.3) 
 tools            3.4.3    2017-12-01 local          
 utf8             1.1.3    2018-01-03 CRAN (R 3.4.3) 
 utils          * 3.4.3    2017-12-01 local          
 viridisLite      0.3.0    2018-02-01 CRAN (R 3.4.3) 
@assaron
Copy link
Member

assaron commented Feb 15, 2018

Hi Lluís,

Does it sitll happen if you set nproc=1?

@llrs
Copy link
Contributor Author

llrs commented Feb 15, 2018

Same data:

system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 10000))
## 
## Timing stopped at: 0.068 0.12 43.18
system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 10000, nproc = 1))
##    user  system elapsed 
##   0.260   0.028   0.289 
system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 20000, nproc = 1))
# I had to stop it after ~5 minutes (and restart Rstudio)

# In a R --vanilla session (After loading the previous part of the script):
system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 10000, nproc = 1))
# I had to close the terminal after ~5 minutes
# In a R --vanilla session 
#(Only loading the data and some packages: fgsea, reactome.db, org.Hs.eg.db, data.table):
system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 10000, nproc = 1))
# I had to close the terminal after ~5 minutes

Thanks for the quick response

@assaron
Copy link
Member

assaron commented Feb 15, 2018

Is it specific for your grouping and comp1 objects? How about exampleRanks and examplePathways? If yes, can you send these objects (with save(grouping, comp1, file="fgsea_test.rda")) here or by e-mail?

@llrs
Copy link
Contributor Author

llrs commented Feb 15, 2018

Yes, it seems specific to my data (sorry I should have checked that before).
I compressed here the data (in zip format) in order to be able to send it through the issue.

@assaron
Copy link
Member

assaron commented Feb 15, 2018

I can reproduce this. We'll look into it a bit later.

@assaron assaron added the bug label Feb 15, 2018
@llrs
Copy link
Contributor Author

llrs commented Feb 16, 2018

Oh 😮 , I didn't expect to be reproducible, I thought it was a problem in my computer 😭
That's worst than I expected. Let me know if there is anything I can do.

@ToledoEM
Copy link
Contributor

It doesn't seem to be driven by the number of permutations by it self, since repeated execution lead to the same outcome.

library(fgsea)
set.seed(123456789)
for (i in 1:10){
print(system.time(gseaSizeEffect <- fgsea(grouping, comp1, nperm = 1000,nproc = 1)))
}

   user  system elapsed 
  0.082   0.002   0.064 
   user  system elapsed 
  0.838   0.008   0.153 
   user  system elapsed 
  0.222   0.002   0.038 
   user  system elapsed 
  0.230   0.003   0.039 
   user  system elapsed 
  0.222   0.002   0.038 
   user  system elapsed 
  0.215   0.003   0.037 
   user  system elapsed 
  0.214   0.002   0.036 
   user  system elapsed 
  0.208   0.002   0.035 
   user  system elapsed 
  0.226   0.002   0.038 
   user  system elapsed 
  0.238   0.003   0.042 

Then executing the same code

   user  system elapsed 
  0.031   0.001   0.036 
   user  system elapsed 
  0.028   0.000   0.029 
   user  system elapsed 
  0.028   0.000   0.028 
   user  system elapsed 
  0.028   0.000   0.029 

### Frozen for 5 minutes ---

Sometimes I can execute this twice or thrice without an error, sometimes just once.

Also reproducible if sort.comp1 <- rev(sort(comp1)) it is used.

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.4.3 magrittr_1.5   tools_3.4.3    pillar_1.1.0   glue_1.2.0     tibble_1.4.1   yaml_2.1.16    Rcpp_0.12.15   tidyr_0.7.2    rlang_0.1.6    purrr_0.2.4 

@llrs
Copy link
Contributor Author

llrs commented Feb 16, 2018

@ToledoEM Thanks, nice report.
This makes me think that the error is in the c++ code.

assaron added a commit that referenced this issue Feb 16, 2018
@assaron
Copy link
Member

assaron commented Feb 16, 2018

It works now, however, current algorithm does not properly support equal ranks (zero or not) as the ranking is not full in such case. So be cautious about the results. See #18

@llrs
Copy link
Contributor Author

llrs commented Feb 16, 2018

Thanks for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants