Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EpiCompare #2568

Closed
10 tasks done
serachoi1230 opened this issue Mar 18, 2022 · 50 comments
Closed
10 tasks done

EpiCompare #2568

serachoi1230 opened this issue Mar 18, 2022 · 50 comments
Assignees
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution TIMEOUT WARNINGS

Comments

@serachoi1230
Copy link

Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

  • I understand that by submitting my package to Bioconductor,
    the package source and all review commentary are visible to the
    general public.

  • I have read the Bioconductor Package Submission
    instructions. My package is consistent with the Bioconductor
    Package Guidelines.

  • I understand Bioconductor Package Naming Policy and acknowledge
    Bioconductor may retain use of package name.

  • I understand that a minimum requirement for package acceptance
    is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
    Passing these checks does not result in automatic acceptance. The
    package will then undergo a formal review and recommendations for
    acceptance regarding other Bioconductor standards will be addressed.

  • My package addresses statistical or bioinformatic issues related
    to the analysis and comprehension of high throughput genomic data.

  • I am committed to the long-term maintenance of my package. This
    includes monitoring the support site for issues that users may
    have, subscribing to the bioc-devel mailing list to stay aware
    of developments in the Bioconductor community, responding promptly
    to requests for updates from the Core team in response to changes in
    R or underlying software.

  • I am familiar with the Bioconductor code of conduct and
    agree to abide by it.

I am familiar with the essential aspects of Bioconductor software
management, including:

  • The 'devel' branch for new packages and features.
  • The stable 'release' branch, made available every six
    months, for bug fixes.
  • Bioconductor version control using Git
    (optionally via GitHub).

For questions/help about the submission process, including questions about
the output of the automatic reports generated by the SPB (Single Package
Builder), please use the #package-submission channel of our Community Slack.
Follow the link on the home page of the Bioconductor website to sign up.

@bioc-issue-bot
Copy link
Collaborator

Hi @serachoi1230

Thanks for submitting your package. We are taking a quick
look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: EpiCompare
Type: Package
Title: Comparison, Benchmarking & QC of Epigenetic Datasets
Version: 0.99.0
Authors@R: 
    person(given = "Sera",
	    family = "Choi",
	    role = c("cre"),
 email = "[email protected]",
 comment = c(ORCID = "0000-0002-5077-1984"))
Description: EpiCompare is used to compare and analyse epigenetic datasets for quality control and benchmarking purposes. The package outputs an HTML report consisting of three sections: (1. General metrics) Metrics on peaks (percentage of blacklisted and non-standard peaks, and peak widths) and fragments (duplication rate) of samples, (2. Peak overlap) Percetnage and statistical significance of overlapping and non-overlapping peaks. Also includes upset plot and (3. Functional annotation) functional annotation (ChromHMM, ChIPseeker and enrichment analysis) of peaks. Also includes peak enrichment around TSS.
URL: https://github.com/neurogenomics/EpiCompare
BugReports: https://github.com/neurogenomics/EpiCompare/issues
License: GPL-3
Depends: R(>= 4.1)
LazyData: FALSE
biocViews:
	ComparativeGenomics, Epigenetics, Genetics,
	QualityControl
SystemRequirements: GNU make
Imports:
	GenomicRanges,
	genomation,
	IRanges,
	reshape2,
	viridis,
	ggplot2,
	ggpubr,
	ChIPseeker,
	BRGenomics,
	clusterProfiler,
	stats,
	methods,
	plotly,
	stringr,
	dplyr,
	tidyr,
	UpSetR,
	org.Hs.eg.db,
	TxDb.Hsapiens.UCSC.hg19.knownGene,
	rmarkdown
RoxygenNote: 7.1.2.9000
Encoding: UTF-8
Suggests: 
    testthat (>= 3.0.0),
    badger,
    knitr,
    htmlwidgets
Config/testthat/edition: 3
VignetteBuilder: knitr

@bioc-issue-bot bioc-issue-bot added the 1. awaiting moderation submitted and waiting clearance to access resources label Mar 18, 2022
@vjcitn
Copy link
Collaborator

vjcitn commented Mar 19, 2022

vignette has no content. issue filed at base repo

@vjcitn vjcitn added the 3e. pending pre-review changes review has indicated blocking concern that needs attention label Mar 19, 2022
@serachoi1230
Copy link
Author

Hi Vince, thank you for your review. I have updated the vignette (uses BiocStyle) and I can also confirm that the package runs through both CMD check and BiocCheck::BiocCheck cleanly. Thank you very much!

@vjcitn vjcitn added pre-check passed pre-review performed and ready to be added to git and removed 3e. pending pre-review changes review has indicated blocking concern that needs attention labels Mar 23, 2022
@bioc-issue-bot
Copy link
Collaborator

A reviewer has been assigned to your package. Learn what to expect
during the review process.

IMPORTANT: Please read this documentation for setting
up remotes to push to git.bioconductor.org. It is required to push a
version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org
access. To manage keys and future access you may want to active your
Bioconductor Git Credentials Account

@bioc-issue-bot bioc-issue-bot added 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place and removed 1. awaiting moderation submitted and waiting clearance to access resources pre-check passed pre-review performed and ready to be added to git labels Mar 25, 2022
@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, ERROR, TIMEOUT".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/EpiCompare to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 09d58e877d4102415c0169eda33662f1fdb7cf97

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, ERROR, TIMEOUT".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/EpiCompare to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 402bba0865e634c0c9b588c95d604e6c36c7d6c6

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, ERROR".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/EpiCompare to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 7695e0025d9784cfa32632ae11091f884559c955

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, ERROR".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/EpiCompare to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: fd158b0a86ba27d2a7c211d2fb18cb38899b34e2

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/EpiCompare to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@serachoi1230
Copy link
Author

Hi @PeteHaitch,

It appears to me that the remaining warnings are all coming from loading the package genomation, which EpiCompare uses. I'm not quite sure what I can do to fix that? Do you have any suggestions?

Thank you so much!

Best wishes,
Sera

@PeteHaitch
Copy link

The warning is saying that there's a clash between the grid::pattern() function and the Biostrings::pattern() S4 generic function.
This happens because grid is attached to the search path by genomation, which Depends on grid.
Such clashes are somewhat common, and I don't think it's a bit problem that needs 'fixing', but I'll defer to @vjcitn for clarification.

I have another package to review ahead of EpiCompare in the queue, but we should be able to get through the review process before the next Bioconductor release (see https://bioconductor.org/developers/release-schedule/)

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 53ca848410b271a6fa5b15bfdecd32f45fdae1ed

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, ERROR, TIMEOUT".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/EpiCompare to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot bioc-issue-bot removed the 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place label Apr 21, 2022
@serachoi1230 serachoi1230 reopened this Apr 21, 2022
@bioc-issue-bot
Copy link
Collaborator

Dear @serachoi1230 ,

We have reopened the issue to continue the review process.
Please remember to push a version bump to git.bioconductor.org
to trigger a new build.

@bioc-issue-bot bioc-issue-bot added the 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place label Apr 21, 2022
@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 69b66f2070e6cde351343bbac315b7da299e4bf6

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, TIMEOUT".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/EpiCompare to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 4a4b039647a9f1fdcb4ed1806dacf39d09ba5a58

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/EpiCompare to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@serachoi1230
Copy link
Author

serachoi1230 commented Apr 21, 2022

Hi @PeteHaitch,

Sorry, I accidentally closed the issue, please ignore!

Please see below the line-by-line response to your comments.

We are aware that the Bioconductor release deadline is this Friday and there is very limited time to sort absolutely everything. We tackled all "required" points, but in the interest of time, we were not able to sort all "recommended" points. We hope that you understand - we prioritised some "recommended" issues and removed those, but we will continue to work to eliminate all in future releases.

Please let us know if further modifications are needed - we hope we can get this done by Friday.
Thank you so much for your effort!

REQUIRED

  • Do the snake_case functions really need to be exported? My understanding from reading the documentation is that a user would typically only need to run the EpiCompare() function to generate the HTML report. Or do you expect users will want/need to run the snake_case functions (e.g., fragment_info(), gather_files(), etc.) directly? Keeping the snake_case functions as internal (i.e. not exported) would simplify the package API, which is generally a good thing.

  • Response: Yes, you're right in that typically users would only need to run EpiCompare(). However, we believed it would also be useful to make the individual functions available for those that are looking to generate individual figures and tables instead of generating a full report, which can take awhile.

  • Please use AnnotationHub to retrieve the necessary chain file(s) rather than including inst/extdata/hg38ToHg19.over.chain.gz in the package.

  • Response: I have updated this and removed inst/extdata/hg38ToHg19.over.chain.gz in the package.

  • Please check if other downloaded resources (e.g., those ENCODE resources downlaoded by EpiCompare:::get_chrHMM_annotation()) are already available from AnnotationHub. My brief investigation suggests these aren't currently available (see below). If indeed these aren't currently available from AnnotationHub, then the next best thing is to (1) create AnnotationHub resources for these files or (2) use BiocFileCache to cache these downloads (https://contributions.bioconductor.org/r-code.html?q=biocfile#web-querying-and-file-caching). @lshep is there a strong preference for (1) over (2)?

  • Response: In the interest of time, I have implemented the second option as it is a lot quicker to do

  • Please clarify why EpiCompare is restricted to just hg19 or hg38.
    It doesn't seem any additional work to support all genomes with (1) a TxDb resource and (2) an optional(?) blacklist.
    All functions that use TxDB object (except overlap_stat_plot(); see below) seem to be agnostic to the species.
    I note that the supplied blacklists are only hg19 and hg38, but the blacklists exist for other species and genome builds (as documented in https://github.com/Boyle-Lab/Blacklist) and perhaps EpiCompare can be tweaked so that it is still useful without such a blacklist?

  • Response: Thank you for the suggestion. We considered this, but one of the functional annotation tools EpiCompare uses is ChromHMM, which provides annotation files for human cell-lines only (as far as we are aware). Therefore, for the consistency and interest of time, we decided to only include hg19 and hg38 for now, but we will work to support all genomes in future releases.

  • The hg38_blacklist is documented as downloaded from a secondary source (https://github.com/Boyle-Lab/Blacklist); please use the primary source.Related, documented sources of hg19_blacklist and hg38_blacklist are different; each should refer to the primary source rather than secondary source.

  • Response: I have now modified this.

  • In §3.3 of vignette, please set output_dir of EpiCompare() to tempdir(), which is what is done in the man page for the function (?EpiCompare). You might add a comment to the vignette and documentation that a user will probably want to specify a different output_dir when running the function on their own data.

  • Response: I have modified this.

  • Please also ensure this chunk is evaluated so that it is routinely tested by the Bioconductor build infrastructure.
    In my testing I found that in order to have this code chunk be successfully evaluated that I needed to re-name (or remove the names) some of the code chunks in either vignettes/EpiCompare.Rmd or in inst/markdown/EpiCompare.Rmd so that there weren't any duplicated labels between the two (specifically, the setup and Session Info chunks).

  • Response: I have modified this

  • gather_files() uses parallel::mclapply() rather than BiocParallel::bplapply(). Please use BiocParallel for all parallelisation (https://contributions.bioconductor.org/r-code.html?q=biocparall#parallel-recommendations).

  • Response: I have modified this

  • I don't think import_narrowPeak() is required. It reinvents the wheel and is slower than the 'official' Bioconductor function, BiocIO::import() (formerly rtracklayer::import())

  • Response: The function BiocIO::import returns an error when loading down to the custom format of the ENCODE bed files so the function EpiCompare::import_narrowPeak deals with different peak file formats. Code:

> path <- file.path("https://www.encodeproject.org/files",
+                    "ENCFF044JNJ/@@download/ENCFF044JNJ.bed.gz")
> path <- EpiCompare::write_example_peaks(datasets="encode_H3K27ac")
Writing ==> /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpwfRtrB/processed_results/encode_H3K27ac.narrowPeaks.bed
> peaks <- EpiCompare::import_narrowPeak(path=path)
5,142 peaks imported.
> BiocIO::import(path)
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
scan() expected 'an integer', got '"start"' 

and even without the writing locally you get the same issue:

> path <- file.path("https://www.encodeproject.org/files",
+                    "ENCFF044JNJ/@@download/ENCFF044JNJ.bed.gz")
> BiocIO::import(path)
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  scan() expected 'an integer', got '12.59446'
  • Commented-out example in import_narrowPeak() doesn't work. I think the URL should be https://www.encodeproject.org/files/ENCFF044JNJ/@@download/ENCFF044JNJ.bed.gz. Please fix if retaining import_narrowPeak() as an exported function.

  • Response: This example does work now

  • write_example_peaks() doesn't write a valid narrowPeak file.

  • Response: Does write a valid file now, related to the issue above

  • Do not use direct slot access with @; please use the accessor instead (e.g., instead of x@elementMetadata use elementMetadata(x)).

  • Response: I have modified all direct slot access that use @

  • It is generally advisable to import all of methods into NAMESPACE rather than specific functions.

  • Response: I have modified this

  • Please check how you are importing S4 generics and methods into the NAMESPACE with roxygen2. If you want to import a method then use @importMethodsFrom rather than @importFrom (https://r-pkgs.org/namespace.html#namespace-NAMESPACE).

  • Response: I have modified this

  • overlap_stat_plot() hardcodes the use of TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene. Why not allow the user to supply this as a txdb or annotation argument like you are in other functions (e.g., tss_plot())?

  • Response: Thank you for spotting this, I have now fixed the function to include an annotation argument.

  • Please make an effort to address all NOTES produced by BiocCheck::BiocCheck(). In particular, I would expect these to be remedied as advised (unless there are exceptional reasons):

[1] "Update R version dependency from 4.1 to 4.2.0."
[2] " Avoid sapply(); use vapply()"   
  • Response: I have eliminated these NOTES, but now I'm getting a new NOTE:
* NOTE: Avoid using '=' for assignment and use '<-' instead

I cannot find this error anywhere in the package and it's also not picked up locally when I run BiocCheck::BiocCheck

  • Data must be fully documented. I took a look at 1 example at random (hg19_blacklist) andfound this wasn't the case.

  • Response: Thank you again for spotting this, I have revised the documentation and modified this

  • Other examples of where the supplied data are not sufficiently documented or there are other issues with its documentation (there may be more, please check):chromHMM_annotation_K562: The code in the Source of the documentation isn't properly formatted. CnR_H3K27ac: Always use accessors rather than direct slot access, i.e. seqnames(CnR_H3K27ac) rather than CnR_H3K27ac@seqnames. encode_H3K27ac: \url{} isn't valid R code (it's valid markup in .Rd but it is being used here in pseudocode).

  • Response: I have revised the documentation of all supplied data and made appropriate corrections.

-[ ] In light of above comments, please strongly consider using BiocIO for all import/export functionality instead of creating your own or using other packages (e.g., genomation::readBed(), ChIPseeker::readPeakFile())

  • Response: Thank you for the suggestion. In the interest of time, we have decided to stick with other packages for now, but in future releases we will work to convert to using BiocIO.

RECOMMENDED

  • Please consider using GRangesList rather than a list of GRanges objects. Using a GRangesList is arguably more natural for a Bioconductor workflow and comes with some benefits, e.g., the ability to set the SeqInfo of a GRangesList, which would allow for checking of genome compatibility between supplied peaks and annotations by comparing their respective seqinfo().

  • Response: Thank you for your suggestion. We also agree that this is a better way of working with GRanges object, but in the interest of time, we will work to implement this in EpiCompare in later releases.

  • Please run a spellcheck over the documentation and vignette

  • Response: I have done this

  • Please standardise function names that reference chromHMM. E.g., plot_chrmHMM() should be plot_chromHMM()? get_chrHMM_annotation() is yet another variant ...

  • Response: I have done this

  • The code could be re-factored so that some of the 'heavier' dependencies (e.g., TxDb.Hsapiens.UCSC.hg19.knownGene) could go in Suggests and their usage made conditional (see https://contributions.bioconductor.org/description.html?q=suggests#depends-imports-suggests-enhances)

  • Response: I have moved the heavier dependencies to Suggests

  • Why is GNU make listed in SystemRequirements in DESCRIPTION?

  • Response: I have now removed this.

  • Consider adding some more specific biocViews, e.g., ChIPSeq (see https://www.bioconductor.org/packages/release/BiocViews.html)

  • Response: I have done this

  • Consider using just one of camelCase or snake_case for function names.

  • Response: I believe almost all function names use snake_case. The only exception is EpiCompare, which follows the package name. I hope this is okay.

  • Consider using |> (from base) instead of %>%.

  • Response: We have kept the use of %>%, but we have imported from magrittr

  • If using %>%, import from magrittr rather than dplyr.

  • Response: I have done this

  • Consider using base R functionality instead of add-on packages. E.g, using base::strsplit() instead of stringr::str_split() and base::strwrap() instead of stringr::str_wrap() (neither are drop-in replacements but similar) would eliminate the somewhat-heavy dependency on stringr.

    • Response: Thank you for your suggestion. In the interest of time, we have kept it the same, but we will work to remove these heavy dependencies in later releases
  • Correction to NEWS.md: v0.99.0 should be "EpiCompare released submitted to Bioconductor."

  • Response: I have done this

  • Strongly recommend adding a inst/CITATION file (https://contributions.bioconductor.org/citation.html)

  • Response: I have added CITATION file, but we are yet to make a publication. Once we publish a paper, we will update this.

  • Recommend making the installation instructions in the vignette their own section.

  • Response: I have done this

  • Please consider adding a package-level man page (https://contributions.bioconductor.org/docs.html#package-level-documentation)

  • Response: We will work on this

  • Good job writing unit tests. Please take a look at covr::report() to spot gaps in unit test coverage.

  • Response: Thank you, we will continue to work on unit tests in future releases

@PeteHaitch
Copy link

Hi @serachoi1230,

Thank you for the revisions.
There are some issues that seem to have been overlooked from the initial review and a couple of other points from the changes that need to be fixed.

Cheers,
Pete

Required

  • Re import() not working for you. There's 2 issues:
  1. You need to specify an appropriate value to the format argument.
  2. Please use rtracklayer::import() rather than BiocIO::import() (I think this is a bug that I will report to the BiocIO and rtracklayer maintainters; my understanding is that BiocIO::import() is intended to supercede rtracklayer::import())

Here's a reprex showing import() can read a correctly formatted narrowPeak file just fine (and as I showed in the initial review, it's also faster than EpiCompare::import_narrowPeak()).

suppressPackageStartupMessages(library(rtracklayer))

# The example from Sara
path <- "https://www.encodeproject.org/files/ENCFF044JNJ/@@download/ENCFF044JNJ.bed.gz"

# Supplying the appropriate `format` works
import(path, format = "narrowPeak")
#> GRanges object with 51176 ranges and 6 metadata columns:
#>           seqnames              ranges strand |        name     score
#>              <Rle>           <IRanges>  <Rle> | <character> <numeric>
#>       [1]    chr10 100120820-100121527      * |  Peak_17683        36
#>       [2]    chr10 100204840-100205093      * |  Peak_55123        13
#>       [3]    chr10 100205451-100207334      * |   Peak_3178       155
#>       [4]    chr10 100218690-100219261      * |  Peak_17684        36
#>       [5]    chr10 100219390-100220406      * |  Peak_19403        32
#>       ...      ...                 ...    ... .         ...       ...
#>   [51172]     chrX       950933-951431      * |  Peak_14012        47
#>   [51173]     chrX       951487-952037      * |  Peak_19964        32
#>   [51174]     chrX   97828833-97829748      * |  Peak_19510        32
#>   [51175]     chrX   99998086-99998334      * |  Peak_33145        19
#>   [51176]     chrY     9989029-9989319      * |  Peak_52125        14
#>           signalValue    pValue    qValue      peak
#>             <numeric> <numeric> <numeric> <integer>
#>       [1]    12.59446  26.34571  23.81650       297
#>       [2]     4.30475   5.46569   3.43493       152
#>       [3]    44.75533 133.83092 130.23181      1493
#>       [4]    12.59446  26.34571  23.81650       377
#>       [5]     9.84834  22.82066  20.34716       199
#>       ...         ...       ...       ...       ...
#>   [51172]    16.68090   36.3314  33.65495       156
#>   [51173]    11.44904   22.2138  19.75487       317
#>   [51174]     9.84834   22.8207  20.34716       164
#>   [51175]     6.76534   10.6263   8.41839       164
#>   [51176]     4.49197    5.9362   3.88656       158
#>   -------
#>   seqinfo: 24 sequences from an unspecified genome; no seqlengths

Created on 2022-04-22 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R Under development (unstable) (2021-10-27 r81107)
#>  os       macOS Big Sur/Monterey 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_AU.UTF-8
#>  ctype    en_AU.UTF-8
#>  tz       Australia/Melbourne
#>  date     2022-04-22
#>  pandoc   2.16.2 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package              * version  date (UTC) lib source
#>  Biobase                2.55.2   2022-04-07 [1] Bioconductor
#>  BiocGenerics         * 0.41.2   2021-11-19 [1] Bioconductor
#>  BiocIO                 1.5.0    2021-11-05 [1] Bioconductor
#>  BiocParallel           1.29.20  2022-04-05 [1] Bioconductor
#>  Biostrings             2.63.3   2022-03-29 [1] Bioconductor
#>  bitops                 1.0-7    2021-04-24 [1] CRAN (R 4.2.0)
#>  cli                    3.2.0    2022-02-14 [1] CRAN (R 4.2.0)
#>  crayon                 1.5.1    2022-03-26 [1] CRAN (R 4.2.0)
#>  DelayedArray           0.21.2   2021-11-19 [1] Bioconductor
#>  digest                 0.6.29   2021-12-01 [1] CRAN (R 4.2.0)
#>  ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate               0.15     2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi                  1.0.3    2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap                1.1.0    2021-01-25 [1] CRAN (R 4.2.0)
#>  fs                     1.5.2    2021-12-08 [1] CRAN (R 4.2.0)
#>  GenomeInfoDb         * 1.31.7   2022-04-03 [1] Bioconductor
#>  GenomeInfoDbData       1.2.8    2022-04-13 [1] Bioconductor
#>  GenomicAlignments      1.31.2   2021-11-19 [1] Bioconductor
#>  GenomicRanges        * 1.47.6   2022-01-12 [1] Bioconductor
#>  glue                   1.6.2    2022-02-24 [1] CRAN (R 4.2.0)
#>  highr                  0.9      2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools              0.5.2    2021-08-25 [1] CRAN (R 4.2.0)
#>  IRanges              * 2.29.1   2021-11-19 [1] Bioconductor
#>  knitr                  1.38     2022-03-25 [1] CRAN (R 4.2.0)
#>  lattice                0.20-45  2021-09-22 [1] CRAN (R 4.2.0)
#>  lifecycle              1.0.1    2021-09-24 [1] CRAN (R 4.2.0)
#>  magrittr               2.0.3    2022-03-30 [1] CRAN (R 4.2.0)
#>  Matrix                 1.4-1    2022-03-23 [1] CRAN (R 4.2.0)
#>  MatrixGenerics         1.7.0    2021-10-26 [1] Bioconductor
#>  matrixStats            0.62.0   2022-04-19 [1] CRAN (R 4.2.0)
#>  pillar                 1.7.0    2022-02-01 [1] CRAN (R 4.2.0)
#>  pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr                  0.3.4    2020-04-17 [1] CRAN (R 4.2.0)
#>  R.cache                0.15.0   2021-04-30 [1] CRAN (R 4.2.0)
#>  R.methodsS3            1.8.1    2020-08-26 [1] CRAN (R 4.2.0)
#>  R.oo                   1.24.0   2020-08-26 [1] CRAN (R 4.2.0)
#>  R.utils                2.11.0   2021-09-26 [1] CRAN (R 4.2.0)
#>  RCurl                  1.98-1.6 2022-02-08 [1] CRAN (R 4.2.0)
#>  reprex                 2.0.1    2021-08-05 [1] CRAN (R 4.2.0)
#>  restfulr               0.0.13   2017-08-06 [1] CRAN (R 4.2.0)
#>  rjson                  0.2.21   2022-01-09 [1] CRAN (R 4.2.0)
#>  rlang                  1.0.2    2022-03-04 [1] CRAN (R 4.2.0)
#>  rmarkdown              2.13     2022-03-10 [1] CRAN (R 4.2.0)
#>  Rsamtools              2.11.0   2021-11-19 [1] Bioconductor
#>  rstudioapi             0.13     2020-11-12 [1] CRAN (R 4.2.0)
#>  rtracklayer          * 1.55.4   2022-03-23 [1] Bioconductor
#>  S4Vectors            * 0.33.17  2022-04-06 [1] Bioconductor
#>  sessioninfo            1.2.2    2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi                1.7.6    2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr                1.4.0    2019-02-10 [1] CRAN (R 4.2.0)
#>  styler                 1.7.0    2022-03-13 [1] CRAN (R 4.2.0)
#>  SummarizedExperiment   1.25.3   2021-12-08 [1] Bioconductor
#>  tibble                 3.1.6    2021-11-07 [1] CRAN (R 4.2.0)
#>  utf8                   1.2.2    2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs                  0.4.1    2022-04-13 [1] CRAN (R 4.2.0)
#>  withr                  2.5.0    2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun                   0.30     2022-03-02 [1] CRAN (R 4.2.0)
#>  XML                    3.99-0.9 2022-02-24 [1] CRAN (R 4.2.0)
#>  XVector                0.35.0   2021-10-26 [1] Bioconductor
#>  yaml                   2.3.5    2022-02-21 [1] CRAN (R 4.2.0)
#>  zlibbioc               1.41.0   2021-10-26 [1] Bioconductor
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
  • Please fix the %\VignetteIndexEntry{} field in the vignette (it's currently 'MungeSumStats')

  • requireNamespace() should only be used for Suggested packages (e.g., data.table). It should not be needed or used for packages listed in Imports (e.g., GenomicRanges). Please see https://r-pkgs.org/description.html for when and how to use requireNamespace().

(base) Peter@Peters-MacBook-Pro-2 EpiCompare (master)*$ grep requireNamespace R/*
R/gather_files.R:  requireNamespace("data.table")
R/gather_files.R:  requireNamespace("BiocParallel")
R/get_chromHMM_annotation.R:  requireNamespace("GenomicRanges")
R/get_chromHMM_annotation.R:  requireNamespace("BiocFileCache")
R/get_chromHMM_annotation.R:  if (!requireNamespace("BiocFileCache"))
R/import_narrowPeak.R:  requireNamespace("data.table")
R/import_narrowPeak.R:  requireNamespace("rtracklayer")
R/overlap_stat_plot.R:  requireNamespace("GenomicRanges")
R/plot_enrichment.R:  requireNamespace("clusterProfiler")
R/plot_enrichment.R:  requireNamespace("org.Hs.eg.db")
  • Please upate the installation instructions in the README and vignette to the official Bioconductor installation instructions.These will be:
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("EpiCompare") 
  • write_example_peaks() is still not writing a valid narrowPeaks file.
suppressPackageStartupMessages(library(EpiCompare))
#> Warning: replacing previous import 'Biostrings::pattern' by 'grid::pattern' when
#> loading 'genomation'
suppressPackageStartupMessages(library(rtracklayer))
suppressPackageStartupMessages(library(S4Vectors))

path <- write_example_peaks(datasets = "encode_H3K27ac")
#> Writing ==> /var/folders/f1/6pjy5xbn0_9_7xwq6l7fj2yc0000gn/T//Rtmporozk4/processed_results/encode_H3K27ac.narrowPeaks.bed


# Different column names after export+import round trip.
colnames(mcols(encode_H3K27ac))
#> [1] "name"        "score"       "strand"      "signalValue" "pValue"     
#> [6] "qValue"      "peak"

colnames(mcols(import_narrowPeak(path)))
#> 5,142 peaks imported.
#> [1] "name"        "score"       "strand.1"    "signalValue" "pValue"     
#> [6] "qValue"      "peak"

# 'Official' Bioconductor function cannot import exported dataset.
import(path, format = "narrowPeak")
#> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : scan() expected 'an integer', got '"start"'

# Character fields are quoted in exported dataset.
#> system(paste0("head -n 5 ", path))
#> "seqnames"	"start"	"end"	"width"	"strand"	"name"	"score"	"strand.1"	"signalValue"	"pValue"	"qValue"	"peak"
#> "chr1"	10002541	10004163	1623	"*"	"Peak_2650"	170	"."	31.36983	148.02612	144.32048	1348
#> "chr1"	100064339	100065136	798	"*"	"Peak_11419"	58	"."	16.17942	46.30823	43.50269	329
#> "chr1"	100065447	100065946	500	"*"	"Peak_14209"	46	"."	10.24276	35.46492	32.79924	335
#> "chr1"	100066041	100066635	595	"*"	"Peak_33569"	19	"."	5.48637	10.29599	8.09466	456
  • Still usage of @ instead of accessors
(base) Peter@Peters-MacBook-Pro-2 EpiCompare (master)*$ grep @ R/* | grep -v "#'"
R/import_narrowPeak.R:                                "@@download/attachment",
R/overlap_upset_plot.R:   sample_name <- rep(peaklist_names[i], length(overlap@to))
R/overlap_upset_plot.R:   df <- data.frame(peak=overlap@from, sample=sample_name)
  • Still problems with documentation of data (same issues is highlighted in initial review). Please look at the rendered documentation to double-check how they look when you make changes.

Recommended

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 692fb3264c20ec4b038f2de3920519597876eb14

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, TIMEOUT".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/EpiCompare to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 9d025c96915c2878eeb6dfd8f7d5805006e05910

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, TIMEOUT".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/EpiCompare to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@serachoi1230
Copy link
Author

serachoi1230 commented Apr 22, 2022

Hi @PeteHaitch ,

Thank you very much for getting back to me, please see below line-by-line response to your review.

Also, we believe that the runtime timeout is an anomaly as this has happened before, where it completes all checks locally (mac) in 8 mins and on linux, 5 mins, and the timeout goes away with no apparent reason.

REQUIRED

  • import() not working for you. There's 2 issues: You need to specify an appropriate value to the format argument.
    Please use rtracklayer::import() rather than BiocIO::import() (I think this is a bug that I will report to the BiocIO and rtracklayer maintainters; my understanding is that BiocIO::import() is intended to supercede rtracklayer::import())

  • Response: I removed the function you asked and changed the write function so it writes in valid narrow peak format

  • Please fix the %\VignetteIndexEntry{} field in the vignette (it's currently 'MungeSumStats')

  • Response: Thank you for spotting this, I have now fixed this

  • requireNamespace() should only be used for Suggested packages (e.g., data.table). It should not be needed or used for packages listed in Imports (e.g., GenomicRanges). Please see https://r-pkgs.org/description.html for when and how to use requireNamespace().

  • Response: I have now fixed this

  • Please upate the installation instructions in the README and vignette to the official Bioconductor installation instructions.

  • I have added the installation instructions

  • write_example_peaks() is still not writing a valid narrowPeaks file.

  • Response: this has been fixed now

  • Still usage of @ instead of accessors

  • Response: Sorry about that, I've now removed all @. Please note that

R/import_narrowPeak.R:                                "@@download/attachment",

comes from

import_narrowPeak <- function(path,
                              as_file = file.path(
                                "https://www.encodeproject.org/documents",
                                "948203bb-8eb2-42a2-8b12-1c10f356c998",
                                "@@download/attachment",
                                "narrowPeak.as"
                              ),

which is part of an address

  • Still problems with documentation of data (same issues is highlighted in initial review). Please look at the rendered documentation to double-check how they look when you make changes.
  • Response: Sorry about this, I have now double-checked the rendered documentation and made changes.

RECOMMENDED

@PeteHaitch PeteHaitch added 3a. accepted will be ingested into Bioconductor daily builder for distribution and removed 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place labels Apr 22, 2022
@bioc-issue-bot
Copy link
Collaborator

Your package has been accepted. It will be added to the
Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor
community. If you are interested in becoming a Bioconductor package
reviewer, please see Reviewers Expectations.

@PeteHaitch
Copy link

Thanks @serachoi1230 . With your efforts to respond to review comments on a tight deadline, I've accepted the package in the hope it will sneak in before the deadline. @lshep will that be possible?

It's the weekend where I am but I'll take a quick look over the most recent changes to provide any useful feedback for the future development of EpiCompare.

Thank you for your contribution to Bioconductor!

@PeteHaitch
Copy link

Minor things to tidy up:

I've reported a couple of issues that were identified during this review with importing/exporting narrowPeaks files:

  1. Glitch importing narrowPeaks file (workaround requires rtracklayer) BiocIO#2
  2. Lack of functionality to export narrowPeaks files lawremi/rtracklayer#64

If when writing your own package you come across a perceived limitation of a core Bioconductor package, I'd encourage you to report it.
This should hopefully save you work of re-implementing functionality but also benefit the broader community.

@serachoi1230
Copy link
Author

Hi @PeteHaitch,

Thank you very much for all your efforts! We will continue to work on the issues!

Best wishes,
Sera

@mtmorgan
Copy link
Contributor

mtmorgan commented Apr 23, 2022

Briefly, EpiCompare should be doing the equivalent of rtracklayer::import(..., format = "narrowPeak").

This snippet from above

> path <- file.path("https://www.encodeproject.org/files",
+                    "ENCFF044JNJ/@@download/ENCFF044JNJ.bed.gz")
> BiocIO::import(path)
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  scan() expected 'an integer', got '12.59446'

should (a) not use file.path() on the URL, since this is not a file path (on Windows the file path separator is \, for instance) but simply paste(..., sep="/") or paste0() with the appropriate separator prepended to the relevant part of the path, (b) use rtracklayer::import() and (c) add a format = "narrowPeak" argument. So

path <- paste0(
    "https://www.encodeproject.org",
    "/files/ENCFF044JNJ/@@download/ENCFF044JNJ.bed.gz"
)
rtracklayer::import(path, format = "narrowPeak")

rtracklayer itself makes use of BiocIO, but the user of the rtracklayer method does not need to know this.

@lshep
Copy link
Contributor

lshep commented Apr 25, 2022

Yes I'll process the final accepted packages later today so they will be included in the release.

@lshep
Copy link
Contributor

lshep commented Apr 25, 2022

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/serachoi1230.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/
https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("EpiCompare"). The package 'landing page' will be created at

https://bioconductor.org/packages/EpiCompare

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution TIMEOUT WARNINGS
Projects
None yet
Development

No branches or pull requests

6 participants