-
Notifications
You must be signed in to change notification settings - Fork 5
Home
ecc_finder Version: v1.0.0
ecc_finder identifies eccDNA loci by mapping to a reference genome.
ecc_finder Version: v1.0.0
To identify eccDNA loci by mapping to a reference genome.
Usage option in detail: map-ont
usage: ecc_finder.py map-ont <reference.idx> <query.fq> (option)
For large genome such as human, minimap2 takes a few minutes to generate a minimizer index for the reference before mapping. To reduce indexing time, create the index with option -d.
minimap2 -d reference.idx reference.fa
ecc_finder maps sequences in <query.fa>
to index file of reference <reference.idx>
. <query.fa> files can be uncompressed or bgzipped.
Use -t
to set the number of threads Minimap2 uses for mapping. The --mm2-params
options allow one to specify custom alignment parameters for Minimap2, such as --split-prefix=tmp for large genome. Use -a
to set minimum alignment length for query.
By default, ecc_finder places all output and intermediate files in a directory named eccFinder_output
, but this can be changed with -o
. ecc_finder will not overwrite intermediate files that already exist in the output directory. This is to save time producing expensive alignment files. Users can set -w
to overwrite any preexisting files.
Use the -x
option to add the "ecc.ont" prefix to each sequence in the output.
All output is in eccFinder_output
, or whichever directory -o
specifies.
The eccDNA locus in FASTA and csv format.
usage: ecc_finder.py map-sr <reference.idx> <query.fq> (option)
usage: ecc_finder.py asm-ont <query.fq> (option)
usage: ecc_finder.py asm-sr <query.fq> (option)
All output is in eccFinder_output
, or whichever directory -o
specifies.
eccDNA.fasta
The eccDNA locus in FASTA format.
eccDNA.csv
The eccDNA locus in csv format.
Col | Type | Description |
---|---|---|
1 | string | Reference sequence name |
2 | int | Reference start on original strand |
3 | int | Reference start on original strand |
4 | int | Circular read number at the locus |
5 | int | Repeat units of all circular reads |
6 | int | Read coverage at the locus |
7 | int | EccDNA sequence length |