Skip to content

Distance Calculation

Catarina Loureiro edited this page Dec 13, 2024 · 1 revision

BiG-SCAPE’s Pairwise distance calculation is divided between three values that measure:

  • the percentage of shared domain types, measured by the Jaccard Index (JI) - a coefficient of all distinct shared types of domains divided by the total number of distinct domain types.
  • the similarity of adjacent domain pairs, measured by the Adjacency Index (AI) - a coefficient of all distinct shared domain pairs divided by the total number of distinct domain pairs.
  • the similarity between aligned domain sequences, i.e. Domain Sequence Similarity, or DSS index - a score that considers the sequence similarities for every domain type.

Anchor Domains

The DSS score is further subdivided into two components, one which accounts for anchor domains and one which accounts for non-anchor domains. These so-called anchor domains consist of well known core scaffold domains for e.g. PKS and NRPS classes which are given a higher weight in the DSS calculation.

BiG-SCAPE 2’s default anchor domains list resides in the config.yml file and can be modified by the user.

Weight Distribution

The contribution, i.e. weight, of each of the three scores (JI, AI, DSS) to the final BiG-SCAPE distance has been tuned in BiG-SCAPE 1 for each of the BGC class groups defined in v1. To make use of these tuned weights toggle --legacy-weights. Otherwise, the default in BiG-SCAPE 2 is to use the distance metric based on the mix weight distribution. --legacy-weights can be combined with --classify legacy to fully reproduce BiG-SCAPE 1 behavior. To combine with the newer --classify [mode] ensure that the input .gbks are processed with antiSMASH v6 or above.

weights are in the order JC, AI, DSS, Anchor boost

  LEGACY_WEIGHTS = {

  "PKSI": {"weights": (0.22, 0.02, 0.76, 1.0)},
  "PKSother": {"weights": (0.0, 0.68, 0.32, 4.0)},
  "NRPS": {"weights": (0.0, 0.0, 1.0, 4.0)},
  "RiPP": {"weights": (0.28, 0.01, 0.71, 1.0)},
  "saccharide": {"weights": (0.0, 1.0, 0.0, 1.0)},
  "terpene": {"weights": (0.2, 0.05, 0.75, 2.0)},
  "PKS-NRP_Hybrids": {"weights": (0.0, 0.22, 0.78, 1.0)},
  "other": {"weights": (0.01, 0.02, 0.97, 4.0)},
  "mix": {"weights": (0.2, 0.05, 0.75, 2.0)},

  }
Clone this wiki locally