Skip to content

Commit

Permalink
fix concoct citation
Browse files Browse the repository at this point in the history
  • Loading branch information
JeanMainguy committed Nov 9, 2023
1 parent 0cc4ddd commit 21f3ade
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 8 deletions.
15 changes: 10 additions & 5 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,18 @@ @article{kang2019metabat
publisher={PeerJ Inc.}
}

@article{alneberg2013concoct,
title={CONCOCT: clustering contigs on coverage and composition},
author={Alneberg, Johannes and Bjarnason, Brynjar Sm{\'a}ri and de Bruijn, Ino and Schirmer, Melanie and Quick, Joshua and Ijaz, Umer Z and Loman, Nicholas J and Andersson, Anders F and Quince, Christopher},
journal={arXiv preprint arXiv:1312.4038},
year={2013}
@article{alneberg2014concoct,
title={Binning metagenomic contigs by coverage and composition},
author={Alneberg, Johannes and Bjarnason, Brynjar Sm{\'a}ri and De Bruijn, Ino and Schirmer, Melanie and Quick, Joshua and Ijaz, Umer Z and Lahti, Leo and Loman, Nicholas J and Andersson, Anders F and Quince, Christopher},
journal={Nature methods},
volume={11},
number={11},
pages={1144--1146},
year={2014},
publisher={Nature Publishing Group US New York}
}


@article{nissen2021improved,
title={Improved metagenome binning and assembly using deep variational autoencoders},
author={Nissen, Jakob Nybo and Johansen, Joachim and Alles{\o}e, Rosa Lundbye and S{\o}nderby, Casper Kaae and Armenteros, Jose Juan Almagro and Gr{\o}nbech, Christopher Heje and Jensen, Lars Juhl and Nielsen, Henrik Bj{\o}rn and Petersen, Thomas Nordahl and Winther, Ole and others},
Expand Down
6 changes: 3 additions & 3 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,17 @@ bibliography: paper.bib


# Statement of need
Metagenomics enables the study of microbial communities and their individual members through shotgun sequencing. An essential phase of metagenomic analysis is the recovery of metagenome-assembled genomes (MAGs). MAGs serve as a gateway to additional analyses, including the exploration of organism-specific metabolic pathways, and form the basis for comprehensive large-scale metagenomic surveys [@Nayfach_Shi_Seshadri_Pollard_Kyrpides_2019] [@Acinas_Sánchez_et_al_2021].
Metagenomics enables the study of microbial communities and their individual members through shotgun sequencing. An essential phase of metagenomic analysis is the recovery of metagenome-assembled genomes (MAGs). MAGs serve as a gateway to additional analyses, including the exploration of organism-specific metabolic pathways, and form the basis for comprehensive large-scale metagenomic surveys [@Nayfach_Shi_Seshadri_Pollard_Kyrpides_2019, @Acinas_Sánchez_et_al_2021].

In a metagenomic analysis, sequence reads are first assembled into longer sequences called contigs. These contigs are then grouped into bins based on common characteristics in a process called metagenomic binning to obtain MAGs. There are several tools that can be used to binned contigs into MAGs. These tools are based on various statistical and machine learning methods and use contig characteristics such as tetranucleotide frequencies, GC content and similar abundances across samples [@kang2019metabat] [@alneberg2013concoct] [@nissen2021improved].
In a metagenomic analysis, sequence reads are first assembled into longer sequences called contigs. These contigs are then grouped into bins based on common characteristics in a process called metagenomic binning to obtain MAGs. There are several tools that can be used to binned contigs into MAGs. These tools are based on various statistical and machine learning methods and use contig characteristics such as tetranucleotide frequencies, GC content and similar abundances across samples [@kang2019metabat, @alneberg2014concoct, @nissen2021improved].

The approach of applying multiple binning methods and combining them has proven useful to obtain more and better quality MAGs from metagenomic datasets.This combination process is called bin-refinement and several tools exist to perform such tasks, such as DASTool [@sieber2018dastool], MagScot [@ruhlemann2022magscot] and the bin-refinement module of the metaWRAP pipeline [@uritskiy2018metawrap]. Of these, metaWRAP's bin-refinement tool has demonstrated remarkable efficiency in benchmark analysis [@meyer2022critical]. However, it has certain limitations, most notably its inability to integrate more than three binning results. In addition, it repeatedly uses CheckM [@parks2015checkm] to assess bin quality throughout its execution, which contributes to its slower performance. Furthermore, since it is embedded in a larger framework, it may present challenges when attempting to integrate it into an independent analysis pipeline.

We present Binette, a bin refinement tool inspired by metaWRAP's bin refinement module, which addresses the limitations of the latter and ensures better results.

# Summary
Binette is a Python reimplementation of the bin refinement module used in metaWRAP. It takes as input sets of bins generated by various binning tools. Using these input bin sets, Binette constructs new hybrid bins using basic set operations. Specifically, a bin can be defined as a set of contigs, and when two or more bins share at least one contig, Binette generates new bins based on their intersection, difference, and union. This approach differs from metaWRAP, which exclusively generates hybrid bins based on bin intersections and allows Binette to expand the range of possible bins .
Bin completeness and contamination are assessed using CheckM2 [@chklovski2023checkm2]. Bins are scored using the following scoring function: completeness - weight * contamination, with the default weight set to 3. These scored bins are then sorted, facilitating the selection of a final new set of non-redundant bins.
Bin completeness and contamination are assessed using CheckM2 [@chklovski2023checkm2]. Bins are scored using the following scoring function: $completeness - weight * contamination$, with the default weight set to 3. These scored bins are then sorted, facilitating the selection of a final new set of non-redundant bins.
The ability to score bins is based on CheckM2 rather than CheckM1 as in the metaWRAP pipeline. CheckM2 uses a novel approach to evaluate bin quality based on machine learning techniques. This approach improves speed and also provides better results than CheckM1. Binette initiates CheckM2 processing by running its initial steps once for all contigs within the input bins. These initial steps involve gene prediction using Prodigal and alignment against the CheckM2 database using Diamond [@buchfink2015diamond]. Binette uses Pyrodigal [@larralde2022pyrodigal], a Python module that provides bindings and an interface to Prodigal. The intermediate Checkm2 results are then used to assess the quality of individual bins, eliminating redundant calculations and speeding up the refinement process.
Binette serves as the bin refinement tool within the metaGWS metagenomic analysis pipeline, providing a robust and faster alternative to the bin refinement module of the metaWRAP pipeline as well as other similar bin refinement tools.

Expand Down

0 comments on commit 21f3ade

Please sign in to comment.