Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download reference data from UCSC for RefSeq #8

Open
wenbostar opened this issue Jun 17, 2019 · 0 comments
Open

Download reference data from UCSC for RefSeq #8

wenbostar opened this issue Jun 17, 2019 · 0 comments

Comments

@wenbostar
Copy link
Owner

The CDS and protein data were downloaded from UCSC on the same day with running the following code that had the following warning message:

library(PGA)
annotation_path <- tempdir()
pepfasta <- "~/Downloads/hg19_refGenePro.fa"
CDSfasta <- "~/Downloads/hg19_refGeneCDS.fa"
PrepareAnnotationRefseq2(genome='hg19', CDSfasta, pepfasta, annotation_path,
                         dbsnp=NULL, splice_matrix=FALSE, COSMIC=FALSE)
Build TranscriptDB object (txdb.sqlite) ... 
Download the refGene table ... OK
Download the hgFixed.refLink table ... OK
Extract the 'transcripts' data frame ... OK
Extract the 'splicings' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
 done
Prepare gene/transcript/protein id mapping information (ids.RData) ...  done
Prepare exon annotation information (exon_anno.RData) ...  done
Prepare protein sequence (proseq.RData) ...  done
Prepare protein coding sequence (procodingseq.RData)...  done
Warning message:
In .extractCdsLocsFromUCSCTxTable(ucsc_txtable) :
  UCSC data anomaly in 433 transcript(s): the cds cumulative length is not a multiple of 3
  for transcripts ‘NM_033425’ ‘NM_006510’ ‘NM_001146344’ ‘NM_001010890’ ‘NM_001300891’
  ‘NM_001300891’ ‘NM_017940’ ‘NM_002537’ ‘NM_003954’ ‘NM_006510’ ‘NM_001278563’
  ‘NM_001291815’ ‘NM_001359231’ ‘NM_001354658’ ‘NM_001350198’ ‘NM_001243042’
  ‘NM_001243042’ ‘NM_002570’ ‘NM_001128590’ ‘NM_001271870’ ‘NM_001271872’ ‘NM_001329984’
  ‘NM_001037501’ ‘NM_001037675’ ‘NM_001277444’ ‘NM_001351365’ ‘NM_001297654’
  ‘NM_001288952’ ‘NM_001134939’ ‘NM_001301371’ ‘NM_153334’ ‘NM_001348286’ ‘NM_001348208’
  ‘NM_001348208’ ‘NM_001348208’ ‘NM_001348208’ ‘NM_001348208’ ‘NM_001289152’ ‘NM_199349’
  ‘NM_138324’ ‘NM_138323’ ‘NM_138322’ ‘NM_138319’ ‘NM_005671’ ‘NM_001143962’ ‘NM_000500’
  ‘NM_145171’ ‘NM_001318833’ ‘NM_006904� [... truncated]
sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Amazon Linux AMI 2018.03

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] PGA_1.13.3           rTANDEM_1.22.1       Rcpp_1.0.1
 [4] XML_3.98-1.20        data.table_1.12.2    Biostrings_2.50.2
 [7] XVector_0.22.0       GenomicRanges_1.34.0 GenomeInfoDb_1.18.2
[10] IRanges_2.16.0       S4Vectors_0.20.1     BiocGenerics_0.28.0

loaded via a namespace (and not attached):
 [1] Biobase_2.42.0              httr_1.4.0
 [3] bit64_0.9-7                 assertthat_0.2.1
 [5] BiocManager_1.30.4          blob_1.1.1
 [7] BSgenome_1.50.0             GenomeInfoDbData_1.2.0
 [9] Rsamtools_1.34.1            remotes_2.0.4
[11] progress_1.2.2              pillar_1.4.1
[13] RSQLite_2.1.1               lattice_0.20-38
[15] glue_1.3.1                  digest_0.6.19
[17] RColorBrewer_1.1-2          colorspace_1.4-1
[19] Matrix_1.2-17               plyr_1.8.4
[21] pkgconfig_2.0.2             pheatmap_1.0.12
[23] customProDB_1.22.1          biomaRt_2.38.0
[25] zlibbioc_1.28.0             purrr_0.3.2
[27] scales_1.0.0                processx_3.3.1
[29] BiocParallel_1.16.6         tibble_2.1.3
[31] ggplot2_3.2.0               AhoCorasickTrie_0.1.0
[33] SummarizedExperiment_1.12.0 GenomicFeatures_1.34.8
[35] lazyeval_0.2.2              magrittr_1.5
[37] crayon_1.3.4                memoise_1.1.0
[39] ps_1.3.0                    MASS_7.3-51.4
[41] RMariaDB_1.0.6.9000         tools_3.5.3
[43] prettyunits_1.0.2           hms_0.4.2
[45] matrixStats_0.54.0          stringr_1.4.0
[47] munsell_0.5.0               DelayedArray_0.8.0
[49] AnnotationDbi_1.44.0        ade4_1.7-13
[51] compiler_3.5.3              rlang_0.3.4
[53] grid_3.5.3                  RCurl_1.95-4.12
[55] VariantAnnotation_1.28.13   bitops_1.0-6
[57] gtable_0.3.0                curl_3.3
[59] DBI_1.0.0.9001              R6_2.4.0
[61] GenomicAlignments_1.18.1    Nozzle.R1_1.1-1
[63] dplyr_0.8.1                 rtracklayer_1.42.2
[65] seqinr_3.4-5                bit_1.1-14
[67] readr_1.3.1                 stringi_1.4.3
[69] tidyselect_0.2.5
Repository owner deleted a comment from shanzida45670 Jul 24, 2019
Repository owner deleted a comment from shanzida45670 Jul 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant