Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to generate a nucl database from the Uniprot core proteins #89

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

alexhbnr
Copy link

Instead of downloading the protein sequences directly from Uniprot, this adds the possibility to retrieve the corresponding nucleotide sequences from ENA via metadata stored in XML format.

It iterates over the same input files that are necessary for the functionality to retrieve amino acid sequences from Uniprot. However, instead of directly downloading the FastA file, it downloads the XML file from the Uniprot server. The XML file is parsed using a XML scheme provided from the Uniprot website, then the ENA accession ids for the nucleotide sequences are extracted and the FastA sequences downloaded.

Instead of downloading the protein sequences directly from Uniprot, this
adds the possibility to retrieve the nucleotide sequences from ENA via
metadata stored in XML format.
@fasnicar fasnicar self-requested a review May 17, 2022 12:34
@fasnicar
Copy link
Collaborator

Thanks Alex for this PR.
I tried running the new version of phylophlan_setup_database.py adding the xmlschema package (version 1.10.0 from conda-forge) to my conda env. However, I'm getting the following error:

Traceback (most recent call last):
  File "./phylophlan_setup_database.py", line 25, in <module>
    import xmlschema
  File "/shares/CIBIO-Storage/CM/cmstore/tools/anaconda3/envs/phylophlan-3.0/lib/python3.6/site-packages/xmlschema/__init__.py", line 14, in <module>
    from .resources import normalize_url, normalize_locations, fetch_resource, \
  File "/shares/CIBIO-Storage/CM/cmstore/tools/anaconda3/envs/phylophlan-3.0/lib/python3.6/site-packages/xmlschema/resources.py", line 23, in <module>
    from elementpath import iter_select, XPathContext, XPath2Parser
  File "/shares/CIBIO-Storage/CM/cmstore/tools/anaconda3/envs/phylophlan-3.0/lib/python3.6/site-packages/elementpath/__init__.py", line 18, in <module>
    from .exceptions import ElementPathError, MissingContextError, \
  File "/shares/CIBIO-Storage/CM/cmstore/tools/anaconda3/envs/phylophlan-3.0/lib/python3.6/site-packages/elementpath/exceptions.py", line 12, in <module>
    from .tdop import Token
  File "/shares/CIBIO-Storage/CM/cmstore/tools/anaconda3/envs/phylophlan-3.0/lib/python3.6/site-packages/elementpath/tdop.py", line 405, in <module>
    class Parser(Generic[TK_co], metaclass=ParserMeta):
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

and I'm not 100% sure how to fix it. Do you have any idea?

@alexhbnr
Copy link
Author

Which exact version of Python are you using on your system, Francesco? I get different results for different versions of Python 3.6, but of course not the same one as you.

@fasnicar
Copy link
Collaborator

I have the 3.6.15 from conda-forge (hb7a2778_0_cpython).

@alexhbnr
Copy link
Author

OK, when I create a fresh Python 3.6.15 conda repo and install xmlsearch, I can import it without any issues. I only get one at 3.6.0 itself. I will dig a bit further in the next days what's going on there.

@alexhbnr
Copy link
Author

alexhbnr commented Mar 6, 2023

Hi @fasnicar,

I am very sorry for long hiatus. It got lost in my long list of to-dos.

I pulled all the recent changes that you added to v3.0.3 into this PR. I installed the latest version of PhyloPhlAn v3.0.3 via conda/mamba into a new environment using the follow command: mamba create -n phylophlan_uniprot_test -c bioconda phylophlan=3.0.3

Afterwards, I installed the changes of this PR using pip3: pip3 install -U git+https://github.com/alexhbnr/phylophlan@uniprot_nuclseq

The pip command installed the Python package xmlschema v2.2.2 and elementpath v4.0.1. When I ran phylophlan_setup_database -h, I didn't get any error message. However, conda/mamba automatically pulled Python version 3.11, and not v3.6 for which you saw the error.

Would you have time to check this PR once more on your system?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants