Skip to content
Ivy edited this page Aug 26, 2021 · 56 revisions

ecc_finder Version: v1.0.0

ecc_finder identifies eccDNA loci by mapping to a reference genome.

Long read mapping

ecc_finder Version: v1.0.0

To identify eccDNA loci by mapping to a reference genome.

Algorithm overview.

Long read pipeline algorithm overview

Usage

Usage option in detail: map-ont

usage: ecc_finder.py map-ont <reference.idx> <query.fq> (option)

mapping options

For large genome such as human, minimap2 takes a few minutes to generate a minimizer index for the reference before mapping. To reduce indexing time, create the index with option -d.

minimap2 -d reference.idx reference.fa

ecc_finder maps sequences in <query.fa> to index file of reference <reference.idx>. <query.fa> files can be uncompressed or bgzipped. Use -t to set the number of threads Minimap2 uses for mapping. The --mm2-params options allow one to specify custom alignment parameters for Minimap2, such as --split-prefix=tmp for large genome. Use -a to set minimum alignment length for query.

input/output options

By default, ecc_finder places all output and intermediate files in a directory named eccFinder_output , but this can be changed with -o. ecc_finder will not overwrite intermediate files that already exist in the output directory. This is to save time producing expensive alignment files. Users can set -w to overwrite any preexisting files. Use the -x option to add the "ecc.ont" prefix to each sequence in the output.

output

All output is in eccFinder_output, or whichever directory -o specifies.

The eccDNA locus in FASTA and csv format.

Short read mapping

Algorithm overview.

Long read pipeline algorithm overview

Usage

usage: ecc_finder.py map-sr <reference.idx> <query.fq> (option)

Long read assembly

Algorithm overview.

Long read pipeline algorithm overview

Usage

asm-ont

usage: ecc_finder.py asm-ont <query.fq> (option)

Short read assembly

Algorithm overview.

Long read pipeline algorithm overview

Usage

usage: ecc_finder.py asm-sr <query.fq> (option)

Output

All output is in eccFinder_output, or whichever directory -o specifies. eccDNA.fasta

The eccDNA locus in FASTA format.

eccDNA.csv

The eccDNA locus in csv format.

Col Type Description
1 string Reference sequence name
2 int Reference start on original strand
3 int Reference start on original strand
4 int Circular read number at the locus
5 int Repeat units of all circular reads
6 int Read coverage at the locus
7 int EccDNA sequence length
Clone this wiki locally