This workflow provides an approach to visualize the distribution of a continuous variable (in bedgraph format) for each position within genomic intervals (provided as a GFF file). The purpose of this workflow is to analyze the distribution of the variable around specific positions within the genomic intervals.
To execute the workflow, follow these steps:
-
Run the script using Nextflow:
nextflow main.nf
-
The workflow requires two main input files:
- A GFF file containing genomic intervals.
- A bedgraph file containing genome-wide continuous variable data.
-
The intervals will be aligned at their centers, enabling the detection of distributional differences.
-
GFF File (Genomic Intervals): The GFF file should include the genomic intervals in GFF format. Intervals will be aligned based on their centers. The alignment ensures that intervals of varying lengths can be analyzed. The correct alignment is crucial for accurate distributional comparisons.
-
Bedgraph File (Continuous Variable): The bedgraph file should contain genome-wide continuous variable data. This variable's distribution will be analyzed within the genomic intervals.
See the examples provided for input format demonstrations. Example data used in this workflow is from Dar and Sorek (2018).
Different normalization strategies are implemented in this workflow to suit various analysis requirements. You can specify the normalization strategy within the main.nf
file.
This workflow requires Nextflow version 21.04.3.5560 or later. Make sure you have Nextflow installed with the required version.