Skip to content

Commit

Permalink
Includes minor fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
abmiguez committed Aug 24, 2022
1 parent 6d62358 commit 700dc3d
Show file tree
Hide file tree
Showing 16 changed files with 68 additions and 48 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
* Better representation of, not only the human gut microbiome but also many other animal and ecological environments
* Estimation of metagenome composed by microbes not included in the database with parameter `--unclassified_estimation`
* Compatibility with MetaPhlAn 3 databases with parameter `--mpa3`

Full [changeset](https://github.com/biobakery/MetaPhlAn/blob/master/changeset.txt)
-------------

## Description
Expand Down
2 changes: 1 addition & 1 deletion bioconda_recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ requirements:
- dendropy
- numpy
- cmseq
- phylophlan
- phylophlan >=3.0.3
- biom-format
- matplotlib-base
- biopython
Expand Down
28 changes: 17 additions & 11 deletions changeset.txt
Original file line number Diff line number Diff line change
@@ -1,19 +1,25 @@
=== Version 4
* Adoption of the species-level genome bins system (SGBs).
* New MetaPhlAn marker genes extracted identified from ~1M microbial genomes.
* Ability to profile 21,978 known (kSGBs) and 4,992 unknown (uSGBs) microbial species.
* Better representation of, not only the human gut microbiome but also many other animal and ecological environments.
* Estimation of metagenome composed by microbes not included in the database with parameter --unclassified_estimation.
=== Version 4.0.1
* The new --offline parameter stops MetaPhlAn from automatically checking for updates
* Fixes "KeyError: 't'" error when running MetaPhlAn with the --CAMI_format_output parameter
* Improved StrainPhlAn's gaps management with the newest version of PhyloPhlAn (version 3.0.3)
* Improved set of colors for the plot_tree_graphlan.py script

=== Version 4.0.0
* Adoption of the species-level genome bins system (SGBs)
* New MetaPhlAn marker genes extracted identified from ~1M microbial genomes
* Ability to profile 21,978 known (kSGBs) and 4,992 unknown (uSGBs) microbial species
* Better representation of, not only the human gut microbiome but also many other animal and ecological environments
* Estimation of metagenome composed by microbes not included in the database with parameter --unclassified_estimation
* Compatibility with MetaPhlAn 3 databases with parameter --mpa3

=== Version 3.1
* 433 low-quality species were removed from the MetaPhlAn 3.1 marker database and 2,680 species were added (for a new total of 15,766; a 17% increase).
* Marker genes for a subset of existing bioBakery 3 species were also revised.
* Most existing bioBakery 3 species pangenomes were updated with revised or expanded gene content.
* MetaPhlAn 3.1 software has been updated to work with revised marker database.
* 433 low-quality species were removed from the MetaPhlAn 3.1 marker database and 2,680 species were added (for a new total of 15,766; a 17% increase)
* Marker genes for a subset of existing bioBakery 3 species were also revised
* Most existing bioBakery 3 species pangenomes were updated with revised or expanded gene content
* MetaPhlAn 3.1 software has been updated to work with revised marker database

=== Version 3.0
* New MetaPhlAn marker genes extracted with a newer version of ChocoPhlAn based on UniRef
* New MetaPhlAn marker genes extracted with a newer version of ChocoPhlAn based on UniRef
* Estimation of metagenome composed by unknown microbes with parameter `--unknown_estimation`
* Automatic retrieval and installation of the latest MetaPhlAn database with parameter `--index latest`
* Virus profiling with `--add_viruses`
Expand Down
20 changes: 12 additions & 8 deletions metaphlan/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,22 +238,25 @@ def download_unpack_zip(url,download_file_name,folder,software_name):
except EnvironmentError:
print("WARNING: Unable to remove the temp download: " + download_file)

def resolve_latest_database(bowtie2_db,mpa_latest_url, force=False):
if os.path.exists(os.path.join(bowtie2_db,'mpa_latest')):
def resolve_latest_database(bowtie2_db,mpa_latest_url, force=False, offline=False):
if not offline and os.path.exists(os.path.join(bowtie2_db,'mpa_latest')):
ctime_latest_db = int(os.path.getctime(os.path.join(bowtie2_db,'mpa_latest')))
if int(time.time()) - ctime_latest_db > 31536000: #1 year in epoch
os.rename(os.path.join(bowtie2_db,'mpa_latest'),os.path.join(bowtie2_db,'mpa_previous'))
download(mpa_latest_url, os.path.join(bowtie2_db,'mpa_latest'), force=True)

if not os.path.exists(os.path.join(bowtie2_db,'mpa_latest') or force):
if offline:
print("Database cannot be downloaded with the --offline option activated")
sys.exit()
download(mpa_latest_url, os.path.join(bowtie2_db,'mpa_latest'))

with open(os.path.join(bowtie2_db,'mpa_latest')) as mpa_latest:
latest_db_version = [line.strip() for line in mpa_latest if not line.startswith('#')]

return ''.join(latest_db_version)

def check_and_install_database(index, bowtie2_db, bowtie2_build, nproc, force_redownload_latest):
def check_and_install_database(index, bowtie2_db, bowtie2_build, nproc, force_redownload_latest, offline):
# Create the folder if it does not already exist
if not os.path.isdir(bowtie2_db):
try:
Expand All @@ -266,7 +269,7 @@ def check_and_install_database(index, bowtie2_db, bowtie2_build, nproc, force_re

use_zenodo = False
try:
if urllib.request.urlopen("http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_latest").getcode() != 200:
if not offline and urllib.request.urlopen("http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_latest").getcode() != 200:
# use_zenodo = True
pass
except:
Expand All @@ -284,10 +287,9 @@ def check_and_install_database(index, bowtie2_db, bowtie2_build, nproc, force_re
#try downloading from the segatalab website. If fails, use zenodo
if index == 'latest':
mpa_latest = 'http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_latest'

index = resolve_latest_database(bowtie2_db, mpa_latest, force_redownload_latest)
index = resolve_latest_database(bowtie2_db, mpa_latest, force_redownload_latest, offline)

if os.path.exists(os.path.join(bowtie2_db,'mpa_previous')):
if not offline and os.path.exists(os.path.join(bowtie2_db,'mpa_previous')):
with open(os.path.join(bowtie2_db,'mpa_previous')) as mpa_previous:
previous_db_version = ''.join([line.strip() for line in mpa_previous if not line.startswith('#')])

Expand All @@ -302,7 +304,9 @@ def check_and_install_database(index, bowtie2_db, bowtie2_build, nproc, force_re

if len(glob(os.path.join(bowtie2_db, "*{}*".format(index)))) >= 7:
return index

if offline:
print("Database cannot be downloaded with the --offline option activated")
sys.exit()
# download the tar archive and decompress
sys.stderr.write("\nDownloading MetaPhlAn database\nPlease note due to "
"the size this might take a few minutes\n")
Expand Down
10 changes: 6 additions & 4 deletions metaphlan/metaphlan.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
'Nicola Segata ([email protected]), '
'Duy Tin Truong, '
'Francesco Asnicar ([email protected])')
__version__ = '4.0.0'
__date__ = '22 Aug 2022'
__version__ = '4.0.1'
__date__ = '24 Aug 2022'

import sys
try:
Expand Down Expand Up @@ -337,6 +337,8 @@ def read_params(args):
help="The number of CPUs to use for parallelizing the mapping [default 4]")
arg('--install', action='store_true',
help="Only checks if the MetaPhlAn DB is installed and installs it if not. All other parameters are ignored.")
arg('--offline', action='store_true',
help="If used, MetaPhlAn will not check for new database updates.")
arg('--force_download', action='store_true',
help="Force the re-download of the latest MetaPhlAn database.")
arg('--read_min_len', type=int, default=70,
Expand Down Expand Up @@ -952,7 +954,7 @@ def main():
ESTIMATE_UNK = pars['unclassified_estimation']

# check if the database is installed, if not then install
pars['index'] = check_and_install_database(pars['index'], pars['bowtie2db'], pars['bowtie2_build'], pars['nproc'], pars['force_download'])
pars['index'] = check_and_install_database(pars['index'], pars['bowtie2db'], pars['bowtie2_build'], pars['nproc'], pars['force_download'], pars['offline'])

if pars['install']:
sys.stderr.write('The database is installed\n')
Expand Down Expand Up @@ -1129,7 +1131,7 @@ def main():
if CAMI_OUTPUT:
for clade, taxid, relab in sorted( outpred, reverse=True,
key=lambda x:x[2]+(100.0*(8-(x[0].count("|"))))):
if taxid:
if taxid and clade.split('|')[-1][0] != 't':
rank = ranks2code[clade.split('|')[-1][0]]
leaf_taxid = taxid.split('|')[-1]
taxpathsh = '|'.join([remove_prefix(name) if '_unclassified' not in name else '' for name in clade.split('|')])
Expand Down
6 changes: 3 additions & 3 deletions metaphlan/strainphlan.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
'Francesco Asnicar ([email protected]), '
'Moreno Zolfo ([email protected]), '
'Francesco Beghini ([email protected])')
__version__ = '4.0.0'
__date__ = '22 Aug 2022'
__version__ = '4.0.1'
__date__ = '24 Aug 2022'


import sys
Expand Down Expand Up @@ -415,7 +415,7 @@ def sample_markers_to_fasta(sample_path, filtered_samples, tmp_dir, filtered_cla
for r in sample:
if r['marker'] in filtered_clade_markers:
marker_name = parse_marker_name(r['marker'])
seq = SeqRecord(Seq(r['sequence'][trim_sequences:-trim_sequences].replace("*","-")), id=marker_name, description=marker_name)
seq = SeqRecord(Seq(r['sequence'][trim_sequences:-trim_sequences].replace("*","-").replace('-','N')), id=marker_name, description=marker_name)
SeqIO.write(seq, marker_fna, 'fasta')


Expand Down
4 changes: 2 additions & 2 deletions metaphlan/utils/add_metadata_tree.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#!/usr/bin/env python
__author__ = ('Duy Tin Truong ([email protected]), '
'Aitor Blanco Miguez ([email protected])')
__version__ = '4.0.0'
__date__ = '22 Aug 2022'
__version__ = '4.0.1'
__date__ = '24 Aug 2022'

import argparse as ap
import pandas
Expand Down
6 changes: 3 additions & 3 deletions metaphlan/utils/external_exec.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
'Francesco Asnicar ([email protected]), '
'Moreno Zolfo ([email protected]), '
'Francesco Beghini ([email protected])')
__version__ = '4.0.0'
__date__ = '22 Aug 2022'
__version__ = '4.0.1'
__date__ = '24 Aug 2022'

import os, sys, re, shutil, tempfile
import subprocess as sb
Expand Down Expand Up @@ -147,7 +147,7 @@ def execute_phylophlan(samples_markers_dir, conf_file, min_entries, min_markers,
" --databases_folder "+tmp_dir+" -t n -f "+conf_file+
" --diversity low"+accuracy+" --genome_extension fna"+
" --force_nucleotides --min_num_entries "+str(min_entries)+
" --min_num_markers "+str(min_markers),
" --convert_N2gap --min_num_markers "+str(min_markers),
"input" : "-i",
"output_path" : "--output_folder",
"output" : "-o",
Expand Down
4 changes: 2 additions & 2 deletions metaphlan/utils/extract_markers.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
'Francesco Asnicar ([email protected]), '
'Moreno Zolfo ([email protected]), '
'Francesco Beghini ([email protected])')
__version__ = '4.0.0'
__date__ = '22 Aug 2022'
__version__ = '4.0.1'
__date__ = '24 Aug 2022'

import sys
try:
Expand Down
4 changes: 2 additions & 2 deletions metaphlan/utils/parallelisation.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
'Francesco Asnicar ([email protected]), '
'Moreno Zolfo ([email protected]), '
'Francesco Beghini ([email protected])')
__version__ = '4.0.0'
__date__ = '22 Aug 2022'
__version__ = '4.0.1'
__date__ = '24 Aug 2022'

try:
from .util_fun import error
Expand Down
12 changes: 9 additions & 3 deletions metaphlan/utils/plot_tree_graphlan.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
#!/usr/bin/env python
__author__ = ('Duy Tin Truong ([email protected]), '
'Aitor Blanco Miguez ([email protected])')
__version__ = '4.0.0'
__date__ = '22 Aug 2022'
__version__ = '4.0.1'
__date__ = '24 Aug 2022'

import argparse as ap
import dendropy
from io import StringIO
import re
import random
from collections import defaultdict
import matplotlib.colors as colors
import subprocess

def for_shuffle():
return 0.1

def read_params():
p = ap.ArgumentParser()
Expand Down Expand Up @@ -106,7 +109,10 @@ def main():
count += 1
node.taxon = dendropy.Taxon(label='node_%d'%count)
metadatas = sorted(list(metadatas))
color_names = list(colors.cnames.keys())
color_names = list(colors.TABLEAU_COLORS.keys())
color_names_plus = list(colors.CSS4_COLORS.keys())
random.shuffle(color_names_plus, for_shuffle)
color_names += color_names_plus
metadata2color = {}
for i, md in enumerate(metadatas):
metadata2color[md] = color_names[i % len(color_names)]
Expand Down
4 changes: 2 additions & 2 deletions metaphlan/utils/sample2markers.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
'Francesco Asnicar ([email protected]), '
'Moreno Zolfo ([email protected]), '
'Francesco Beghini ([email protected])')
__version__ = '4.0.0'
__date__ = '22 Aug 2022'
__version__ = '4.0.1'
__date__ = '24 Aug 2022'

import sys
try:
Expand Down
4 changes: 2 additions & 2 deletions metaphlan/utils/sgb_to_gtdb_profile.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
__author__ = 'Aitor Blanco ([email protected]'
__version__ = '4.0.0'
__date__ = '22 Aug 2022'
__version__ = '4.0.1'
__date__ = '24 Aug 2022'

import os, time, sys
import argparse as ap
Expand Down
4 changes: 2 additions & 2 deletions metaphlan/utils/strain_transmission.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
__author__ = ('Aitor Blanco ([email protected]), '
'Mireia Valles-Colomer ([email protected])')
__version__ = '4.0.0'
__date__ = '22 Aug 2022'
__version__ = '4.0.1'
__date__ = '24 Aug 2022'

import os, time, sys
import argparse as ap
Expand Down
4 changes: 2 additions & 2 deletions metaphlan/utils/util_fun.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
'Francesco Asnicar ([email protected]), '
'Moreno Zolfo ([email protected]), '
'Francesco Beghini ([email protected])')
__version__ = '4.0.0'
__date__ = '22 Aug 2022'
__version__ = '4.0.1'
__date__ = '24 Aug 2022'


import os, sys, re, pickletools, pickle, time, bz2, gzip
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

setuptools.setup(
name='MetaPhlAn',
version='4.0.0',
version='4.0.1',
author='Aitor Blanco-Miguez',
author_email='[email protected]',
url='http://github.com/biobakery/MetaPhlAn/',
Expand Down

0 comments on commit 700dc3d

Please sign in to comment.