Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some terms in body site might need review #158

Closed
sdgamboa opened this issue Feb 1, 2023 · 1 comment
Closed

Some terms in body site might need review #158

sdgamboa opened this issue Feb 1, 2023 · 1 comment

Comments

@sdgamboa
Copy link

sdgamboa commented Feb 1, 2023

While using bugsigdbr, I noticed that most terms in 'Body site' should be in lowercase. However, some terms start in uppercase. In some cases, this leads to duplications. I don't think HIV is a body site. See the code below for details.

library(bugsigdbr)
bsdb <- importBugSigDB(cache = FALSE)
body_sites <- sort(unique(bsdb$`Body site`))
terms <- c(
    ## uppercase
    'Genitals', 'Semen', 'Vaginal fluid',
    
    ## repetition due to uppercase
    'posterior fornix of vagina', 
    'Posterior fornix of vagina',
    
    ## not a body site
    'HIV'
)
body_sites[body_sites %in% terms]
#> [1] "Genitals"                   "HIV"                       
#> [3] "posterior fornix of vagina" "Posterior fornix of vagina"
#> [5] "Semen"                      "Vaginal fluid"
df <- bsdb[bsdb$`Body site` %in% terms,]
sigs <- getSignatures(df = df)
names(sigs)
#>  [1] "bsdb:270/1/1_human-papilloma-virus-infection:HPV+_vs_HPV-_UP"                                                                                        
#>  [2] "bsdb:270/1/2_human-papilloma-virus-infection:HPV+_vs_HPV-_DOWN"                                                                                      
#>  [3] "bsdb:270/2/1_human-papilloma-virus-infection:HPV+-(persistance)_vs_HPV+-(clearance)_UP"                                                              
#>  [4] "bsdb:270/2/2_human-papilloma-virus-infection:HPV+-(persistance)_vs_HPV+-(clearance)_DOWN"                                                            
#>  [5] "bsdb:280/1/1_cervical-glandular-intraepithelial-neoplasia:high-grade-squamus-intraepithelial-lesion_vs_low-grade-squamus-intraepithelial-lesion_UP"  
#>  [6] "bsdb:280/1/2_cervical-glandular-intraepithelial-neoplasia:high-grade-squamus-intraepithelial-lesion_vs_low-grade-squamus-intraepithelial-lesion_DOWN"
#>  [7] "bsdb:297/1/1_human-papilloma-virus-infection:non-pregnant-high-risk-HPV_vs_non-pregnant-no-HPV_UP"                                                   
#>  [8] "bsdb:297/1/2_human-papilloma-virus-infection:non-pregnant-high-risk-HPV_vs_non-pregnant-no-HPV_DOWN"                                                 
#>  [9] "bsdb:297/2/1_human-papilloma-virus-infection:pregnant-high-risk-HPV_vs_pregnant-no-HPV_UP"                                                           
#> [10] "bsdb:297/2/2_human-papilloma-virus-infection:pregnant-high-risk-HPV_vs_pregnant-no-HPV_DOWN"                                                         
#> [11] "bsdb:434/1/1_human-papilloma-virus-infection:HPV-+_vs_healthy-control_UP"                                                                            
#> [12] "bsdb:434/1/2_human-papilloma-virus-infection:HPV-+_vs_healthy-control_DOWN"                                                                          
#> [13] "bsdb:434/2/1_cervical-glandular-intraepithelial-neoplasia:LSIL_vs_healthy-control_UP"                                                                
#> [14] "bsdb:434/2/2_cervical-glandular-intraepithelial-neoplasia:LSIL_vs_healthy-control_DOWN"                                                              
#> [15] "bsdb:434/3/1_cervical-glandular-intraepithelial-neoplasia:HSIL_vs_healthy-control_UP"                                                                
#> [16] "bsdb:434/3/2_cervical-glandular-intraepithelial-neoplasia:HSIL_vs_healthy-control_DOWN"                                                              
#> [17] "bsdb:434/4/1_cervical-cancer:cervical-cancer_vs_healthy-control_UP"                                                                                  
#> [18] "bsdb:434/4/2_cervical-cancer:cervical-cancer_vs_healthy-control_DOWN"                                                                                
#> [19] "bsdb:436/1/1_cervical-glandular-intraepithelial-neoplasia:CIN2+/cervical-cancer_vs_healthy-control_UP"                                               
#> [20] "bsdb:522/2/1_endometriosis:Endometriosis-patients_vs_Controls-undergoing-laparoscopic-surgery-for-benign-tumors_DOWN"                                
#> [21] "bsdb:522/2/2_endometriosis:Endometriosis-patients_vs_Controls-undergoing-laparoscopic-surgery-for-benign-tumors_UP"                                  
#> [22] "bsdb:525/1/1_endometriosis:Women-with-EM-associated-CPPS_vs_Women-without-CPPS-presenting-for-routine-examinations_DOWN"                             
#> [23] "bsdb:525/1/2_endometriosis:Women-with-EM-associated-CPPS_vs_Women-without-CPPS-presenting-for-routine-examinations_UP"                               
#> [24] "bsdb:525/2/1_endometriosis:Women-with-EM-associated-CPPS_vs_Women-without-CPPS-presenting-for-routine-examinations_UP"                               
#> [25] "bsdb:525/2/2_endometriosis:Women-with-EM-associated-CPPS_vs_Women-without-CPPS-presenting-for-routine-examinations_DOWN"                             
#> [26] "bsdb:553/1/1_spontaneous-abortion:First-or-second-trimester-miscarriage_vs_Viable-control-pregnancy_DOWN"                                            
#> [27] "bsdb:553/2/1_spontaneous-abortion:First-trimester-miscarriage_vs_Viable-control-pregnancy_DOWN"                                                      
#> [28] "bsdb:553/2/2_spontaneous-abortion:First-trimester-miscarriage_vs_Viable-control-pregnancy_UP"                                                        
#> [29] "bsdb:553/3/1_spontaneous-abortion:complete/incomplete-miscarriage_vs_missed-miscarriage_DOWN"                                                        
#> [30] "bsdb:590/1/1_infection:HIV-1-infection_vs_HIV-1-negative_DOWN"                                                                                       
#> [31] "bsdb:590/1/2_infection:HIV-1-infection_vs_HIV-1-negative_UP"                                                                                         
#> [32] "bsdb:590/1/3_infection:HIV-1-infection_vs_HIV-1-negative_UP"                                                                                         
#> [33] "bsdb:594/1/1_periodontitis:HIV+_vs_HIV–_UP"                                                                                                          
#> [34] "bsdb:594/1/2_periodontitis:HIV+_vs_HIV–_DOWN"                                                                                                        
#> [35] "bsdb:596/1/1_HIV-infection:HIV-infected-men_vs_HIV-uninfected-men_DOWN"                                                                              
#> [36] "bsdb:596/1/2_HIV-infection:HIV-infected-men_vs_HIV-uninfected-men_UP"                                                                                
#> [37] "bsdb:598/1/1_HIV-infection:HIV-seroconverted_vs_HIV-seronegative_UP"                                                                                 
#> [38] "bsdb:598/1/2_HIV-infection:HIV-seroconverted_vs_HIV-seronegative_UP"                                                                                 
#> [39] "bsdb:598/2/1_HIV-infection:HIV-seroconverted_vs_HIV-seronegative_UP"
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R Under development (unstable) (2022-12-25 r83502)
#>  os       Pop!_OS 22.04 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-02-01
#>  pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package       * version date (UTC) lib source
#>  assertthat      0.2.1   2019-03-21 [2] CRAN (R 4.3.0)
#>  BiocFileCache   2.7.1   2022-12-09 [1] Bioconductor
#>  bit             4.0.5   2022-11-15 [2] CRAN (R 4.3.0)
#>  bit64           4.0.5   2020-08-30 [2] CRAN (R 4.3.0)
#>  blob            1.2.3   2022-04-10 [2] CRAN (R 4.3.0)
#>  bugsigdbr     * 1.5.2   2022-11-24 [1] Bioconductor
#>  cachem          1.0.6   2021-08-19 [2] CRAN (R 4.3.0)
#>  cli             3.6.0   2023-01-09 [1] CRAN (R 4.3.0)
#>  crayon          1.5.2   2022-09-29 [2] CRAN (R 4.3.0)
#>  curl            5.0.0   2023-01-12 [2] CRAN (R 4.3.0)
#>  DBI             1.1.3   2022-06-18 [2] CRAN (R 4.3.0)
#>  dbplyr          2.3.0   2023-01-16 [2] CRAN (R 4.3.0)
#>  digest          0.6.31  2022-12-11 [2] CRAN (R 4.3.0)
#>  dplyr           1.1.0   2023-01-29 [2] CRAN (R 4.3.0)
#>  evaluate        0.20    2023-01-17 [2] CRAN (R 4.3.0)
#>  fansi           1.0.4   2023-01-22 [2] CRAN (R 4.3.0)
#>  fastmap         1.1.0   2021-01-25 [2] CRAN (R 4.3.0)
#>  filelock        1.0.2   2018-10-05 [1] CRAN (R 4.3.0)
#>  fs              1.6.0   2023-01-23 [2] CRAN (R 4.3.0)
#>  generics        0.1.3   2022-07-05 [2] CRAN (R 4.3.0)
#>  glue            1.6.2   2022-02-24 [2] CRAN (R 4.3.0)
#>  htmltools       0.5.4   2022-12-07 [2] CRAN (R 4.3.0)
#>  httr            1.4.4   2022-08-17 [2] CRAN (R 4.3.0)
#>  knitr           1.42    2023-01-25 [2] CRAN (R 4.3.0)
#>  lifecycle       1.0.3   2022-10-07 [2] CRAN (R 4.3.0)
#>  magrittr        2.0.3   2022-03-30 [2] CRAN (R 4.3.0)
#>  memoise         2.0.1   2021-11-26 [2] CRAN (R 4.3.0)
#>  pillar          1.8.1   2022-08-19 [2] CRAN (R 4.3.0)
#>  pkgconfig       2.0.3   2019-09-22 [2] CRAN (R 4.3.0)
#>  purrr           1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
#>  R.cache         0.16.0  2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3     1.8.2   2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo            1.25.0  2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils         2.12.2  2022-11-11 [1] CRAN (R 4.3.0)
#>  R6              2.5.1   2021-08-19 [2] CRAN (R 4.3.0)
#>  Rcpp            1.0.10  2023-01-22 [1] CRAN (R 4.3.0)
#>  reprex          2.0.2   2022-08-17 [2] CRAN (R 4.3.0)
#>  rlang           1.0.6   2022-09-24 [2] CRAN (R 4.3.0)
#>  rmarkdown       2.20    2023-01-19 [2] CRAN (R 4.3.0)
#>  RSQLite         2.2.20  2022-12-22 [1] CRAN (R 4.3.0)
#>  rstudioapi      0.14    2022-08-22 [2] CRAN (R 4.3.0)
#>  sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
#>  styler          1.9.0   2023-01-15 [1] CRAN (R 4.3.0)
#>  tibble          3.1.8   2022-07-22 [2] CRAN (R 4.3.0)
#>  tidyselect      1.2.0   2022-10-10 [2] CRAN (R 4.3.0)
#>  tzdb            0.3.0   2022-03-28 [2] CRAN (R 4.3.0)
#>  utf8            1.2.3   2023-01-31 [2] CRAN (R 4.3.0)
#>  vctrs           0.5.2   2023-01-23 [2] CRAN (R 4.3.0)
#>  vroom           1.6.1   2023-01-22 [2] CRAN (R 4.3.0)
#>  withr           2.5.0   2022-03-03 [2] CRAN (R 4.3.0)
#>  xfun            0.37    2023-01-31 [2] CRAN (R 4.3.0)
#>  yaml            2.3.7   2023-01-23 [2] CRAN (R 4.3.0)
#> 
#>  [1] /home/samuel/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /home/samuel/Apps/R-devel/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2023-02-01 with reprex v2.0.2

@lgeistlinger
Copy link
Collaborator

Thanks @sdgamboa.

This will be fixed in the next release of BugSigDB and is already fixed in the devel version of BugSigDB.
Note that all body site terms are exported in capitalized form according to #111.

dat <- bugsigdbr::importBugSigDB(version = "devel", cache = FALSE)
terms <- c("HIV", "posterior fornix of vagina")
> terms %in% dat$Condition
[1] FALSE FALSE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants