A guide to verify NanoPhase installation: reconstructing reference-quality MAGs from the ZymoBIOMICS dataset (nanophase v0.2.3)
Note: this dataset is generated from the ZymoBIOMICS gut microbiome standard and used for feasibility evaluation in our manuscript); in addition, fastq-dump
in SRA toolkit is required.
fastq-dump SRR17913199
conda activate nanophase ## if not in the nanophase env
nanophase meta -l SRR17913199.fastq -t 64 -o nanophase-out ## using only Nanopore long reads
It may take a while to finish. The output folder/file structure should be like:
nanophase-out/
├── 01-LongAssemblies
│ ├── assembly_graph.gfa
│ ├── assembly.fasta
│ ├── assembly_info.txt
│ └── flye.log
├── 02-LongBins
│ ├── INITIAL_BINNING
│ │ ├── metabat2
│ │ ├── tmp.abun.txt
│ │ ├── maxbin2
│ │ └── semibin
│ └── BIN_REFINEMENT
│ ├── maxbin2-bins
│ ├── maxbin2-bins.stats
│ ├── metabat2-bins
│ ├── metabat2-bins.stats
│ ├── metawrap_50_10_bins
│ ├── metawrap_50_10_bins.stats
│ ├── semibin-bins
│ ├── semibin-bins.stats
│ ├── figures
│ ├── bin_refinement.log
│ └── metawrap_50_10_bins.contigs
└── 03-Polishing
├── Racon
│ ├── racon.polish.log
│ ├── bin.1
│ ├── bin.10
│ ├── bin.11
│ ├── bin.12
│ ├── bin.2
│ ├── bin.3
│ ├── bin.4
│ ├── bin.5
│ ├── bin.6
│ ├── bin.7
│ ├── bin.8
│ └── bin.9
├── medaka
│ ├── bin.4-medaka
│ ├── bin.10-medaka
│ ├── bin.3-medaka
│ ├── bin.7-medaka
│ ├── bin.1-medaka
│ ├── bin.12-medaka
│ ├── bin.6-medaka
│ ├── bin.2-medaka
│ ├── bin.8-medaka
│ ├── bin.5-medaka
│ ├── bin.11-medaka
│ ├── bin.9-medaka
│ └── medaka.polish.log
├── checkm-result
│ ├── bins
│ ├── lineage.ms
│ ├── checkm.log
│ ├── checkm_results.tsv
│ └── storage
├── gtdbtk-result
│ ├── gtdbtk.warnings.log
│ ├── gtdbtk.ar53.summary.tsv -> classify/gtdbtk.ar53.summary.tsv
│ ├── align
│ ├── classify
│ ├── gtdbtk.bac120.summary.tsv -> classify/gtdbtk.bac120.summary.tsv
│ ├── gtdbtk.json
│ ├── gtdbtk.log
│ └── identify
├── nanophase.ont.genome.summary
└── Final-bins
├── bin.10.fasta
├── bin.11.fasta
├── bin.12.fasta
├── bin.1.fasta
├── bin.2.fasta
├── bin.3.fasta
├── bin.4.fasta
├── bin.5.fasta
├── bin.6.fasta
├── bin.7.fasta
├── bin.8.fasta
└── bin.9.fasta
48 directories, 34 files
Final reconstructed MAGs could be found in nanophase-out/03-Polishing/Final-bins/
and the summary file of the reconstructed MAGs could be found in nanophase-out/03-Polishing/nanophase.ont.genome.summary
ls nanophase-out/03-Polishing/Final-bins/
bin.10.fasta bin.11.fasta bin.12.fasta bin.1.fasta bin.2.fasta bin.3.fasta bin.4.fasta bin.5.fasta bin.6.fasta bin.7.fasta bin.8.fasta bin.9.fasta
cat nanophase-out/03-Polishing/nanophase.ont.genome.summary
#BinID Completeness Contamination Strain heterogeneity GenomeSize(bp) N_Contig N50(bp) GC GTDB-Taxa
bin.1 100.00 0.00 0.00 2461381 2 2150631 0.26980 d__Bacteria;p__Fusobacteriota;c__Fusobacteriia;o__Fusobacteriales;f__Fusobacteriaceae;g__Fusobacterium;s__Fusobacterium animalis
bin.2 99.44 0.29 0.00 3460839 1 3460839 0.48731 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Roseburia;s__Roseburia hominis
bin.3 99.18 0.55 100.00 1906542 3 1890918 0.52286 d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Limosilactobacillus;s__Limosilactobacillus fermentum
bin.4 95.88 0.00 0.00 2849782 2 2341158 0.55459 d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Verrucomicrobiales;f__Akkermansiaceae;g__Akkermansia;s__Akkermansia muciniphila
bin.5 100.00 0.00 0.00 2157911 1 2157911 0.38952 d__Bacteria;p__Bacillota_C;c__Negativicutes;o__Veillonellales;f__Veillonellaceae;g__Veillonella;s__Veillonella rogosae
bin.6 87.91 0.39 0.00 2522000 3 2083939 0.43694 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Prevotella;s__Prevotella corporis
bin.7 99.16 0.00 0.00 2028285 1 2028285 0.59292 d__Bacteria;p__Actinomycetota;c__Actinomycetia;o__Actinomycetales;f__Bifidobacteriaceae;g__Bifidobacterium;s__Bifidobacterium adolescentis
bin.8 100.00 0.00 0.00 2911148 1 2911148 0.57806 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium prausnitzii_E
bin.9 95.94 0.50 0.00 5191902 5 4733348 0.43263 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides fragilis
bin.10 81.03 0.00 0.00 4065942 1 4065942 0.28625 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Peptostreptococcales;f__Peptostreptococcaceae;g__Clostridioides;s__Clostridioides difficile
bin.11 98.03 0.97 80.00 4491398 26 302533 0.50954 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli
bin.12 58.04 0.00 0.00 1191563 44 34294 0.30992 d__Archaea;p__Methanobacteriota;c__Methanobacteria;o__Methanobacteriales;f__Methanobacteriaceae;g__Methanobrevibacter_A;s__Methanobrevibacter_A smithii
If you had Illumina short reads as well, you may use hybrid strategy (Nanopore long reads + Illumina short reads) to reconstruct reference-quality MAGs
Note: since nanopore sequencing has been improved quite a lot, at this time being, short reads is not necessary; especially for long data generated from the Kit 14.
conda activate nanophase ## if not in the nanophase env
nanophase meta -l lr.fa.gz --hybrid -1 sr_1.fa.gz -2 sr_2.fa.gz -t 40 -o hybrid-nanophase-out ## using Nanopore + Illumina reads
A guide to verify NanoPhase installation: reconstructing reference-quality MAGs from an example dataset (nanophase v0.2.2)
Warning: since nanophase v0.2.3 (semibin as a new binner, which really improved genome reconstruction results a lot for complex environmental samples), it may encounter error issues when performing the binning process with only a few contigs. It should also be an issue when handling very simple long-read datasets, so in this way, I would suggest using nanophase v0.2.2: mamba create -n nanophase-v0.2.2 -c nanophase nanophase=0.2.2 -y
. Sorry for that, we will try to fix this issue in the next release.
Example dataset download
wget https://github.com/example-data/np-example/raw/main/np.test.tar && tar -xvf np.test.tar && rm -rf np.test.tar ## download the example data
Or it waits too long, you may use the following link as well
wget -O np.test.tar https://www.dropbox.com/s/ey9hfntqn789gz0/np.test.tar?dl=0 && tar -xvf np.test.tar && rm -rf np.test.tar
conda activate nanophase ## if not in the nanophase env
nanophase meta -l lr.fa.gz -t 40 -o ont-nanophase-out ## using only Nanopore long reads
It should be finished in ~2 hours. The output folder/file structure should be like:
ont-nanophase-out/
├── 01-LongAssemblies
│ ├── assembly.fasta
│ ├── assembly_info.txt
│ └── flye.log
├── 02-LongBins
│ ├── BIN_REFINEMENT
│ │ ├── bin_refinement.log
│ │ ├── figures
│ │ ├── maxbin2-bins
│ │ │ ├── bin.001.fa
│ │ │ └── bin.002.fa
│ │ ├── maxbin2-bins.stats
│ │ ├── metabat2-bins
│ │ │ └── bin.1.fa
│ │ ├── metabat2-bins.stats
│ │ ├── metawrap_50_10_bins
│ │ │ ├── bin.1.fa
│ │ │ └── bin.2.fa
│ │ ├── metawrap_50_10_bins.contigs
│ │ └── metawrap_50_10_bins.stats
│ └── INITIAL_BINNING
│ ├── maxbin2
│ │ ├── bin.log
│ │ ├── bin.marker
│ │ ├── bin.marker_of_each_bin.tar.gz
│ │ ├── bin.noclass
│ │ ├── bin.summary
│ │ ├── bin.tooshort
│ │ ├── maxbin2_abun.txt
│ │ └── maxbin2-bins
│ ├── metabat2
│ │ ├── bin.log
│ │ ├── metabat2_abun.txt
│ │ └── metabat2-bins
│ └── tmp.abun.txt
└── 03-Polishing
├── Final-bins
│ ├── bin.1.fasta
│ └── bin.2.fasta
├── medaka
│ ├── bin.1-medaka
│ │ └── consensus.fasta
│ ├── bin.2-medaka
│ │ └── consensus.fasta
│ └── medaka.polish.log
├── nanophase.ont.genome.summary
└── Racon
├── bin.1
│ └── bin.1-racon01.fasta
├── bin.2
│ └── bin.2-racon01.fasta
└── racon.polish.log
20 directories, 32 files
Final reconstructed MAGs could be found in ont-nanophase-out/03-Polishing/Final-bins/
and the summary file of the reconstructed MAGs could be found in ont-nanophase-out/03-Polishing/nanophase.ont.genome.summary
ls ont-nanophase-out/03-Polishing/Final-bins/
bin.1.fasta bin.2.fasta
cat ont-nanophase-out/03-Polishing/nanophase.ont.genome.summary
#BinID Completeness Contamination Strain heterogeneity GenomeSize(bp) N_Contig N50(bp) GC IfCir. GTDB-Taxa
bin.1 100.00 0.00 0.00 2028336 1 2028336 0.59294 Y d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Actinomycetales;f__Bifidobacteriaceae;g__Bifidobacterium;s__Bifidobacterium
bin.2 99.18 0.55 100.00 1889262 2 1827333 0.52312 N d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Limosilactobacillus;s__Limosilactobacillus
Using hybrid strategy (Nanopore long reads + Illumina short reads) to reconstruct reference-quality MAGs
conda activate nanophase ## if not in the nanophase env
nanophase meta -l lr.fa.gz --hybrid -1 sr_1.fa.gz -2 sr_2.fa.gz -t 40 -o hybrid-nanophase-out ## using Nanopore + Illumina reads
It should be finished in ~2.5 hours. The output folder/file structure should be like:
hybrid-nanophase-out/
├── 01-LongAssemblies
│ ├── assembly.fasta
│ ├── assembly_info.txt
│ └── flye.log
├── 02-LongBins
│ ├── BIN_REFINEMENT
│ │ ├── bin_refinement.log
│ │ ├── figures
│ │ ├── maxbin2-bins
│ │ │ ├── bin.001.fa
│ │ │ └── bin.002.fa
│ │ ├── maxbin2-bins.stats
│ │ ├── metabat2-bins
│ │ │ └── bin.1.fa
│ │ ├── metabat2-bins.stats
│ │ ├── metawrap_50_10_bins
│ │ │ ├── bin.1.fa
│ │ │ └── bin.2.fa
│ │ ├── metawrap_50_10_bins.contigs
│ │ └── metawrap_50_10_bins.stats
│ └── INITIAL_BINNING
│ ├── maxbin2
│ │ ├── bin.log
│ │ ├── bin.marker
│ │ ├── bin.marker_of_each_bin.tar.gz
│ │ ├── bin.noclass
│ │ ├── bin.summary
│ │ ├── bin.tooshort
│ │ ├── maxbin2_abun.txt
│ │ └── maxbin2-bins
│ ├── metabat2
│ │ ├── bin.log
│ │ ├── metabat2_abun.txt
│ │ └── metabat2-bins
│ └── tmp.abun.txt
└── 03-Polishing
├── Final-bins
│ ├── bin.1.fasta
│ └── bin.2.fasta
├── medaka
│ ├── bin.1-medaka
│ │ └── consensus.fasta
│ ├── bin.2-medaka
│ │ └── consensus.fasta
│ └── medaka.polish.log
├── nanophase.ARGs.summary.txt
├── nanophase.hybrid.genome.summary
├── POLCA
│ ├── POLCA-bins
│ │ ├── bin.1-polca.fasta
│ │ └── bin.2-polca.fasta
│ └── polca.polish.log
├── Polypolish
│ ├── bin.1-Polypolish
│ │ └── bin.1-polypolish.fasta
│ ├── bin.2-Polypolish
│ │ └── bin.2-polypolish.fasta
│ └── polypolish.polish.log
└── Racon
├── bin.1
│ └── bin.1-racon01.fasta
├── bin.2
│ └── bin.2-racon01.fasta
└── racon.polish.log
25 directories, 39 files
Final reconstructed MAGs could be found in hybrid-nanophase-out/03-Polishing/Final-bins/
and the summary file of the reconstructed MAGs could be found in hybrid-nanophase-out/03-Polishing/nanophase.hybrid.genome.summary
ls hybrid-nanophase-out/03-Polishing/Final-bins/
bin.1.fasta bin.2.fasta
cat hybrid-nanophase-out/03-Polishing/nanophase.hybrid.genome.summary
#BinID Completeness Contamination Strain heterogeneity GenomeSize(bp) N_Contig N50(bp) GC IfCir. GTDB-Taxa
bin.1 100.00 0.00 0.00 2028324 1 2028324 0.59294 Y d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Actinomycetales;f__Bifidobacteriaceae;g__Bifidobacterium;s__Bifidobacterium
bin.2 99.00 0.55 100.00 1889441 2 1827513 0.52311 N d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Limosilactobacillus;s__Limosilactobacillus
nanophase args
module still is in the active development stage at this time being. We hope to provide more versatile functions in the next release.
We use SARG database for the ARGs annotation, the output results were generated using nanophase v=0.2.0
conda activate nanophase ## if not in the nanophase env
nanophase args -i ont-nanophase-out/03-Polishing/Final-bins/ -x fasta -o nanophase.ARGs.summary.txt
It should be finished in ~10 mins and if ARGs were identified in the MAGs, then nanophase.ARGs.summary.txt
could be found:
cat nanophase.ARGs.summary.txt
BinID_ContigID_OrfID similarity Type Subtype HMM.category Mechanism.group Mechanism.subgroup Mechanism.subgroup2
bin.1_contig_3_55 86.347 mupirocin mupirocin__Bifidobacteria intrinsic ileS conferring resistance to mupirocin Bifidobacteria intrinsic ileS conferring resistance to mupirocin Antibiotic target alteration
bin.2_contig_1_1235 99.638 bacitracin bacitracin__bacA bacA Antibiotic target alteration undecaprenyl pyrophosphate