-
Notifications
You must be signed in to change notification settings - Fork 27
Domain Detection
BiG-SCAPE uses the hmmscan
tool from the HMMER suite to predict protein domains from a hmm profile database (commonly, Pfam) in the protein sequences detected in the input .gbk
files. Furthermore, it uses the trusted cutoff
(TC) bit score thresholds in the model to set reporting and inclusion thresholds. TC thresholds are generally considered to be the score of the lowest-scoring known true positive that is above all known false positives. The coordinates used for extracting and handling the domain sequences are the envelope coordinates.
After domain prediction, a step of filtering is performed where overlapping domains are discarded based on the per-domain score. When comparing pairs of domains within the same CDS, domains will be filtered if the amino acid overlap percentage of any of the domain's sequences (i.e. overlap in amino acids / domain length) is higher than overlap_cutoff (set in config.yml: DOMAIN_OVERLAP_CUTOFF: 0.1
), which is 0.1
by default. When two such domains are detected, the domain with the higher bit score is kept. Domains in CDSs in opposite strands are not considered to overlap.