Skip to content

Latest commit

 

History

History
23 lines (17 loc) · 2.46 KB

pinfish.md

File metadata and controls

23 lines (17 loc) · 2.46 KB

Pinfish pipeline programs

Minimap2

A versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. Typical use cases include: (1) mapping PacBio or Oxford Nanopore genomic reads to the human genome; (2) finding overlaps between long reads with error rate up to ~15%; (3) splice-aware alignment of PacBio Iso-Seq or Nanopore cDNA or Direct RNA reads against a reference genome; (4) aligning Illumina single- or paired-end reads; (5) assembly-to-assembly alignment; (6) full-genome alignment between two closely related species with divergence below ~15%.

Pinfish

  • spliced_bam2gff - a tool for converting sorted BAM files containing spliced alignments (generated by minimap2 or GMAP) into GFF2 format. Each read will be represented as a distinct transcript. This tool comes handy when visualizing spliced reads at particular loci and to provide input to the rest of the toolchain.
  • cluster_gff - this tool takes a sorted GFF2 file as input and clusters together reads having similar exon/intron structure and creates a rough consensus of the clusters by taking the median of exon boundaries from all transcripts in the cluster.
  • polish_clusters - this tool takes the cluster definitions generated by cluster_gff and for each cluster creates an error corrected read by mapping all reads on the read with the median length (using minimap2) and polishing it using racon. The polished reads can be mapped to the genome using minimap2 or GMAP.
  • collapse_partials - this tool takes GFFs generated by either cluster_gff or polish_clusters and filters out transcripts which are likely to be based on RNA degradation products from the 5' end. The tool clusters the input transcripts into "loci" by the 3' ends and discards transcripts which have a compatible transcripts in the loci with more exons.

Gffread

GFF/GTF utility providing format conversions, filtering, FASTA sequence extraction and more.

GffCompare

  • compare and evaluate the accuracy of RNA-Seq transcript assemblers (Cufflinks, Stringtie).
  • collapse (merge) duplicate transcripts from multiple GTF/GFF3 files (e.g. resulted from assembly of different samples)
  • classify transcripts from one or multiple GTF/GFF3 files as they relate to reference transcripts provided in a annotation file (also in GTF/GFF3 format)

http://ccb.jhu.edu/software/stringtie/gffcompare.shtml