From 1ce65d7a84b5a96900fc56df30d928cac44daef7 Mon Sep 17 00:00:00 2001 From: JeanMainguy Date: Thu, 11 Jan 2024 20:39:13 +0100 Subject: [PATCH] commit figure in svg --- paper/binette_overview.svg | 683 +++++++++++++++++++++++++++++++++++++ paper/paper.md | 2 +- 2 files changed, 684 insertions(+), 1 deletion(-) create mode 100644 paper/binette_overview.svg diff --git a/paper/binette_overview.svg b/paper/binette_overview.svg new file mode 100644 index 0000000..43f7ecd --- /dev/null +++ b/paper/binette_overview.svg @@ -0,0 +1,683 @@ + + + +A-BUnion binIntersection binDifference binsshare at least a contigA∪BA∩BB-AIntermediate bins between A and BBin BBin Acontig n contig n Input bin setsAll bins sorted by their scoreIntermediate binsFinal binsCreation of intermediate binsBins are scored with CheckM2Selection of non redundant binsAB diff --git a/paper/paper.md b/paper/paper.md index edd5c47..6c824d9 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -38,7 +38,7 @@ We present Binette, a bin refinement tool inspired by metaWRAP's bin refinement Binette is a Python reimplementation and enhanced version of the bin refinement module used in metaWRAP. It takes as input sets of bins generated by various binning tools. Using these input bin sets, Binette constructs new hybrid bins using basic set operations. Specifically, a bin can be defined as a set of contigs, and when two or more bins share at least one contig, Binette generates new bins based on their intersection, difference, and union. This approach differs from metaWRAP, which exclusively generates hybrid bins based on bin intersections and allows Binette to expand the range of possible bins. -![Overview of Binette Steps. (A) Binette Workflow Overview: Input bins serve as the basis for generating intermediate bins. Each bin undergoes a scoring process utilizing quality metrics provided by CheckM2. Subsequently, the bins are sorted based on their scores, and a selection process is executed to retain non-redundant bins. (B) Intermediate Bin Creation Example: Bins are represented as square shapes, each containing colored lines representing the contigs they contain. Creation of intermediate bins involves the initial bins sharing at least one contig. Set operations are applied to the contigs within the bins to generate these intermediate bins.](./binette_overview.png) +![Overview of Binette Steps. (A) Binette Workflow Overview: Input bins serve as the basis for generating intermediate bins. Each bin undergoes a scoring process utilizing quality metrics provided by CheckM2. Subsequently, the bins are sorted based on their scores, and a selection process is executed to retain non-redundant bins. (B) Intermediate Bin Creation Example: Bins are represented as square shapes, each containing colored lines representing the contigs they contain. Creation of intermediate bins involves the initial bins sharing at least one contig. Set operations are applied to the contigs within the bins to generate these intermediate bins.](./binette_overview.svg) Bin completeness and contamination are assessed using CheckM2 [@chklovski2023checkm2]. Bins are scored using the following scoring function: $completeness - weight * contamination$, with the default weight set to 2. These scored bins are then sorted, facilitating the selection of a final new set of non-redundant bins. The ability to score bins is based on CheckM2 rather than CheckM1 as in the metaWRAP pipeline. CheckM2 uses a novel approach to evaluate bin quality based on machine learning techniques. This approach improves speed and also provides better results than CheckM1. Binette initiates CheckM2 processing by running its initial steps once for all contigs within the input bins. These initial steps involve gene prediction using Prodigal and alignment against the CheckM2 database using Diamond [@buchfink2015diamond]. Binette uses Pyrodigal [@larralde2022pyrodigal], a Python module that provides bindings and an interface to Prodigal [@hyatt2010prodigal]. The intermediate Checkm2 results are then used to assess the quality of individual bins, eliminating redundant calculations and speeding up the refinement process.