Skip to content

Commit

Permalink
update paper
Browse files Browse the repository at this point in the history
  • Loading branch information
JeanMainguy committed Nov 10, 2023
1 parent 57a9809 commit c5bf759
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 9 deletions.
20 changes: 20 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -148,3 +148,23 @@ @article{larralde2022pyrodigal
year={2022}
}

@article{hyatt2010prodigal,
title={Prodigal: prokaryotic gene recognition and translation initiation site identification},
author={Hyatt, Doug and Chen, Gwo-Liang and LoCascio, Philip F and Land, Miriam L and Larimer, Frank W and Hauser, Loren J},
journal={BMC bioinformatics},
volume={11},
pages={1--11},
year={2010},
publisher={Springer}
}



@article{metagWGS_inprep,
title={MetagWGS, a complete workflow to analyse metagenomic data (from Illumina reads or PacBio HiFi reads)},
author={Noirot, Céline and Mainguy, Jean and Hoede, Claire}, % need completion with all authors...
journal={Journal},
year={in preparation}

}

19 changes: 10 additions & 9 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,35 +10,36 @@ tags:
authors:
- name: Jean Mainguy
orcid: 0009-0006-9160-9744
affiliation: 1
affiliation: "1, 2"
- name: Claire Hoede
orcid: 0000-0001-5054-7731
affiliation: 1
affiliation: "1, 2"
corresponding: true
affiliations:
- name: Université de Toulouse, INRAE, BioinfOmics, GenoToul Bioinformatics facility, 31326, Castanet-Tolosan, France
index: 1
- name: Université de Toulouse, INRAE, UR 875 MIAT, 31326, Castanet-Tolosan, France
index: 2
date: 30 november 2023
bibliography: paper.bib
---


# Statement of need
Metagenomics enables the study of microbial communities and their individual members through shotgun sequencing. An essential phase of metagenomic analysis is the recovery of metagenome-assembled genomes (MAGs). MAGs serve as a gateway to additional analyses, including the exploration of organism-specific metabolic pathways, and form the basis for comprehensive large-scale metagenomic surveys [@Nayfach2019global_human_gut_microbiome] [@Acinas_Sánchez_et_al_2021].
Metagenomics enables the study of microbial communities and their individual members through shotgun sequencing. An essential phase of metagenomic analysis is the recovery of metagenome-assembled genomes (MAGs). MAGs serve as a gateway to additional analyses, including the exploration of organism-specific metabolic pathways, and form the basis for comprehensive large-scale metagenomic surveys [@Nayfach2019global_human_gut_microbiome;@Acinas_Sánchez_et_al_2021].

In a metagenomic analysis, sequence reads are first assembled into longer sequences called contigs. These contigs are then grouped into bins based on common characteristics in a process called metagenomic binning to obtain MAGs. There are several tools that can be used to binned contigs into MAGs. These tools are based on various statistical and machine learning methods and use contig characteristics such as tetranucleotide frequencies, GC content and similar abundances across samples [@kang2019metabat] [@alneberg2014concoct] [@nissen2021improved].
In a metagenomic analysis, sequence reads are first assembled into longer sequences called contigs. These contigs are then grouped into bins based on common characteristics in a process called metagenomic binning to obtain MAGs. There are several tools that can be used to binned contigs into MAGs. These tools are based on various statistical and machine learning methods and use contig characteristics such as tetranucleotide frequencies, GC content and similar abundances across samples [@kang2019metabat;@alneberg2014concoct;@nissen2021improved].

The approach of applying multiple binning methods and combining them has proven useful to obtain more and better quality MAGs from metagenomic datasets.This combination process is called bin-refinement and several tools exist to perform such tasks, such as DASTool [@sieber2018dastool], MagScot [@ruhlemann2022magscot] and the bin-refinement module of the metaWRAP pipeline [@uritskiy2018metawrap]. Of these, metaWRAP's bin-refinement tool has demonstrated remarkable efficiency in benchmark analysis [@meyer2022critical]. However, it has certain limitations, most notably its inability to integrate more than three binning results. In addition, it repeatedly uses CheckM [@parks2015checkm] to assess bin quality throughout its execution, which contributes to its slower performance. Furthermore, since it is embedded in a larger framework, it may present challenges when attempting to integrate it into an independent analysis pipeline.

We present Binette, a bin refinement tool inspired by metaWRAP's bin refinement module, which addresses the limitations of the latter and ensures better results.

# Summary
Binette is a Python reimplementation of the bin refinement module used in metaWRAP. It takes as input sets of bins generated by various binning tools. Using these input bin sets, Binette constructs new hybrid bins using basic set operations. Specifically, a bin can be defined as a set of contigs, and when two or more bins share at least one contig, Binette generates new bins based on their intersection, difference, and union. This approach differs from metaWRAP, which exclusively generates hybrid bins based on bin intersections and allows Binette to expand the range of possible bins .
Bin completeness and contamination are assessed using CheckM2 [@chklovski2023checkm2]. Bins are scored using the following scoring function: $completeness - weight * contamination$, with the default weight set to 3. These scored bins are then sorted, facilitating the selection of a final new set of non-redundant bins.
Binette is a Python reimplementation and enhanced version of the bin refinement module used in metaWRAP. It takes as input sets of bins generated by various binning tools. Using these input bin sets, Binette constructs new hybrid bins using basic set operations. Specifically, a bin can be defined as a set of contigs, and when two or more bins share at least one contig, Binette generates new bins based on their intersection, difference, and union. This approach differs from metaWRAP, which exclusively generates hybrid bins based on bin intersections and allows Binette to expand the range of possible bins.

The ability to score bins is based on CheckM2 rather than CheckM1 as in the metaWRAP pipeline. CheckM2 uses a novel approach to evaluate bin quality based on machine learning techniques. This approach improves speed and also provides better results than CheckM1. Binette initiates CheckM2 processing by running its initial steps once for all contigs within the input bins. These initial steps involve gene prediction using Prodigal and alignment against the CheckM2 database using Diamond [@buchfink2015diamond]. Binette uses Pyrodigal [@larralde2022pyrodigal], a Python module that provides bindings and an interface to Prodigal. The intermediate Checkm2 results are then used to assess the quality of individual bins, eliminating redundant calculations and speeding up the refinement process.
Bin completeness and contamination are assessed using CheckM2 [@chklovski2023checkm2]. Bins are scored using the following scoring function: $completeness - weight * contamination$, with the default weight set to 3. These scored bins are then sorted, facilitating the selection of a final new set of non-redundant bins. The ability to score bins is based on CheckM2 rather than CheckM1 as in the metaWRAP pipeline. CheckM2 uses a novel approach to evaluate bin quality based on machine learning techniques. This approach improves speed and also provides better results than CheckM1. Binette initiates CheckM2 processing by running its initial steps once for all contigs within the input bins. These initial steps involve gene prediction using Prodigal and alignment against the CheckM2 database using Diamond [@buchfink2015diamond]. Binette uses Pyrodigal [@larralde2022pyrodigal], a Python module that provides bindings and an interface to Prodigal [@hyatt2010prodigal]. The intermediate Checkm2 results are then used to assess the quality of individual bins, eliminating redundant calculations and speeding up the refinement process.


Binette serves as the bin refinement tool within the metaGWS metagenomic analysis pipeline, providing a robust and faster alternative to the bin refinement module of the metaWRAP pipeline as well as other similar bin refinement tools.
Binette serves as the bin refinement tool within the [metagWGS](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs) metagenomic analysis pipeline [@metagWGS_inprep], providing a robust and faster alternative to the bin refinement module of the metaWRAP pipeline as well as other similar bin refinement tools.


# References

0 comments on commit c5bf759

Please sign in to comment.