diff --git a/README.md b/README.md
index 1a950f3..c5a5ac3 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@
The core idea of Mabs is to optimize parameters of a genome assembler in order to make an assembly where **protein-coding genes** are assembled more accurately than when the assembler is run with its default parameters.
Briefly, Mabs works as follows:
1) It makes a series of genome assemblies by Hifiasm or Flye, using different values of parameters of these programs. Mabs uses a couple of tricks to accelerate the assembly process.
-2) For each genome assembly, Mabs evaluates the quality of BUSCO genes' assembly using a special metric that I call "AG". For how AG is calculated, see [calculate_AG](#internal_link_to_calculate_AG).
+2) For each genome assembly, Mabs evaluates the quality of BUSCO genes' assembly using a special metric that I call "AG". For how AG is calculated, see [calculate_AG](#calculate_ag).
3) The genome assembly with the largest AG is considered the best.
For details about the algorithm of Mabs see [a preprint on BioRxiv](https://www.biorxiv.org/content/10.1101/2022.12.19.521016v2).
@@ -13,25 +13,22 @@ For details about the algorithm of Mabs see [a preprint on BioRxiv](https://www.
## Table of Contents
-- [Installation](#internal_link_to_Installation)
-- [How to use](#internal_link_to_How_to_use)
- - [Mabs-hifiasm](#internal_link_to_Mabs-hifiasm)
- - [Mabs-flye](#internal_link_to_Mabs-flye)
- - [The output of Mabs](#internal_link_to_The_output_of_Mabs)
- - [Testing Mabs-hifiasm and Mabs-flye](#internal_link_to_Testing_Mabs-hifiasm_and_Mabs-flye)
-- [calculate_AG](#internal_link_to_calculate_AG)
-- [Questions and Answers](#internal_link_to_Questions_and_Answers)
+- [Installation](#installation)
+- [How to use](#how-to-use)
+ - [Mabs-hifiasm](#a-mabs-hifiasm)
+ - [Mabs-flye](#b-mabs-flye)
+ - [The output of Mabs](#c-the-output-of-mabs)
+ - [Testing Mabs-hifiasm and Mabs-flye](#d-testing-mabs-hifiasm-and-mabs-flye)
+- [calculate_AG](#calculate_ag)
+- [Questions and Answers](#questions-and-answers)
-
## Installation
Mabs requires Python 3, Perl 5, GCC, Zlib-dev, Make.
To install Mabs, download the latest version from [Releases](https://github.com/shelkmike/Mabs/releases), then extract the archive and run
`bash install.sh`
-
## How to use
Two main components of Mabs are Mabs-hifiasm and Mabs-flye. Mabs-hifiasm works as a parameter optimizer of Hifiasm, while Mabs-flye works as a parameter optimizer of Flye.
-
#### a) Mabs-hifiasm
Mabs-hifiasm is intended for PacBio HiFi (also known as CCS) reads. Also, it can be used for very accurate (accuracy ≥99%) Nanopore reads, as their characteristics are similar to characteristics of HiFi reads.
To run Mabs-hifiasm, a user should provide two values:
@@ -55,7 +52,6 @@ Example 1:
Example 2:
`mabs-hifiasm.py --pacbio_hifi_reads hifi_reads.fastq --short_hi-c_reads_R1 hi-c_reads_trimmed_R1.fastq --short_hi-c_reads_R2 hi-c_reads_trimmed_R2.fastq --ultralong_nanopore_reads ultralong_reads.fastq --download_busco_dataset diptera_odb10.2020-08-05.tar.gz --threads 40`
-
#### b) Mabs-flye
Mabs-flye is intended for Nanopore reads and PacBio CLR reads (also known as "old PacBio reads"). Similarly to Mabs-hifiasm, Mabs-flye requires two values:
1. A path to reads, provided via options "--nanopore_reads", "--pacbio_clr_reads" or "--pacbio_hifi_reads". If you have several read datasets created by different technologies, these options can be used simultaneously. Keep in mind that if you have only HiFi reads, it's better to use Mabs-hifiasm.
@@ -72,7 +68,6 @@ Example 1:
Example 2:
`mabs-flye.py --nanopore_reads nanopore_reads.fastq --pacbio_hifi_reads pacbio_hifi_reads.fastq --download_busco_dataset diptera_odb10.2020-08-05.tar.gz --threads 40`
-
#### c) The output of Mabs
Both Mabs-hifiasm and Mabs-flye have a similar output structure. Both of them create a folder which, by default, is named "Mabs_results". The name can be changed via the "--output_folder" option. The two main files that a user may need are:
1) ./Mabs_results/mabs_logs.txt
@@ -80,7 +75,6 @@ This file contains information on how Mabs-hifiasm or Mabs-flye run and whether
2) ./Mabs_results/The_best_assembly/assembly.fasta
These are the contigs you need.
-
#### d) Testing Mabs-hifiasm and Mabs-flye
If you are not sure whether Mabs-hifiasm and Mabs-flye have been installed properly, you can run
`mabs-hifiasm.py --run_test`
@@ -89,7 +83,6 @@ or
These two commands assemble the first chromosome of Saccharomyces cerevisiae, which is approximately 200 kbp. If after the assembly finishes you see a file ./Mabs_results/The_best_assembly/assembly.fasta which is slightly larger than 200 KB, Mabs works correctly.
-
## calculate_AG
Besides Mabs-hifiasm and Mabs-flye, Mabs contains a third tool, named calculate_AG. Its purpose is to assess genome assembly quality.
@@ -117,7 +110,6 @@ This type of diagrams is called sinaplot, see https://cran.r-project.org/web/pac
The recommended usage of calculate_AG is to compare the quality of assemblies of a single genome made by different genome assemblers, or made by a single assembler with different parameters. Besides the value of AG (in the file ./AG_calculation_results/AG.txt), calculate_AG also provides the exact numbers of genes in single-copy orthogroups, in true multicopy orthogroups, and in false multicopy orthogroups; the corresponding values can be found at the end of the file ./AG_calculation_results/logs.txt.
-
## Questions and Answers
1. Is Mabs useful?
In my experience, Mabs-hifiasm is currently the best tool for genome assembly using PacBio HiFi reads.