Skip to content

Latest commit

 

History

History
40 lines (30 loc) · 1.54 KB

README.org

File metadata and controls

40 lines (30 loc) · 1.54 KB

Panmethyl

Panmethyl maps methylation data from long-reads to pangenomes.

Inputs

It takes the following inputs:

  • --out - directory in which panmethyl will write the output files.
  • --bams - CSV file listing the BAM files to be mapped to the pangenome.

The format of this CSV file is:

sample,path
name1,path/to/bam1
name2,path/to/bam2

These BAM files must be annotated with the appropriate methylation information. For example, the location of modified bases must be encoded in the MM tag and the likelihoods of methylation must be encoded in the ML tag.

  • --graph - Pangenome in rGFA format. Currently, only graphs created with minigraph are supported, but can be extended to all graphs by replacing minigraph with other graph aligners.

Outputs

Panmethyl outputs a .graphMethylaion plain text file for each entry in --bams. This file is a CSV file listing the graph node, the position of the modified base, its strand, the coverage on the modified base, and the average methylation level, encoded on a scale from 0 to 255 (as in the ML tag).

Geeneral steps in the pipeline

  1. Index the position of every CpG dinucleotide in the input graph (bin/index_cpg.py).
  2. Convert BAM file to FASTQ (samtools).
  3. Annotate reads in FASTQ with methylation information from the BAM (tagtobed).
  4. Map the FASTQ to the pangenome with minigraph.
  5. Lift the methylation annotation from the reads to the graph (bin/lift_5mC.py).
  6. Count the average methylation level of CpGs (bin/nodes_methylation.py).