Skip to content
Ivy edited this page Aug 27, 2021 · 56 revisions

Welcome to the ecc_finder wiki!

Extrachromosomal circular DNA (eccDNA) has been observed in different species for decades, and more and more evidence shows that this specific type of DNA molecules may play an important role in rapid adaptation. Therefore, characterizing the full landscape of eccDNA has become critical, and there are several protocols for enriching eccDNAs and performing short-read or long-read sequencing.

However, there is currently no available bioinformatic tool to identify eccDNAs from Nanopore reads. More importantly, the current tools based on Illumina short reads lack an efficient standardized pipeline notably to identify eccDNA originating from repeated loci and cannot be applied to very large genomes. Here we introduce a comprehensive tool to solve both of these two issues.

ecc_finder works well on eccDNA-seq data (either mobilome-seq, Circle-Seq and CIDER-seq) from Arabidopsis, human, and wheat (with genome sizes ranging from 120 Mb to 17 Gb).

ecc_finder is dedicated to

  1. identify eccDNA from Nanopore reads;
  2. identify eccDNA from from Illumina paried end short reads;
  3. provide bona fide locus boundary of eccDNA-producing loci to investigate the origins of eccDNAs;

Validation

  1. To access ecc_finder accuracy for long read, the confidence score is assigned by tandem repeat pattern from read alignment. Except for satellites, when performing self-alignment, linear reads will not repeat itself while circular reads will be repeated two or more times because it goes through the rolling circle amplification experimentally. Therefore, the circular sequence will have a sub- read alignment in the same direction, and this sub-read alignment will be repeated two or more times on the same boundary.

  2. To access ecc_finder accuracy for short read, the confidence score for the eccDNA locus is assigned to the bona fide locus with an even distribution of split and discordant reads throughout their internal region.

ecc_finder_pipeline

There are four modes to analyze eccDNA from different biotechnologies

Long-read-mapping

Algorithm and usage in details, please see the Long-read-mapping

Short-read-mapping

Algorithm and usage in details, please see the Short-read-mapping

Long-read-assembly

Algorithm and usage in details, please see the Long-read-assembly

Short-read-assembly

Algorithm and usage in details, please see the Short-read-assembly

Clone this wiki locally