Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xpath in mzIdentML reader behaves differently from other xpath implementations? #146

Closed
colin-combe opened this issue Apr 9, 2024 · 3 comments

Comments

@colin-combe
Copy link

Hi,

xpath in the pyteomics mzIdentML reader is behaving differently for me from other xpath implementations?

I made a small example project to compare xpath behaviour from the pyteomics mzIdentML reader and a perl xpath implementation - https://github.com/colin-combe/pyteomics-test.

There are text files in the project showing the results i'm getting, but basically for perl xpath:

xpath -e '//SpectrumIdentificationList[@id="SIL_1572215611447534775"]/*' test.mzid
Found 2 nodes in test.mzid:

xpath -e '//SpectrumIdentificationList[@id="SIL_1572215611447534775"]/SpectrumIdentificationResult' test.mzid
Found 1 nodes in test.mzid:

When using xpath in the pyteomics mzIdentML reader with the same file (see https://github.com/colin-combe/pyteomics-test/blob/master/test_xpath.py),
the first xpath selector above return 2 nodes, but the second returns zero?

best wishes,
Colin

@levitsky
Copy link
Owner

levitsky commented Apr 9, 2024

Hi, thanks for isolating this issue. The iterfind method performs a somewhat naive preprocessing of the XPath which aims to help with element namespaces:

path : str
Element name or XPath-like expression. The path is very close to
full XPath syntax, but local names should be used for all elements in the path.
They will be substituted with local-name() checks, up to the (first) predicate.
The path can be absolute or "free". Please don't specify namespaces.

The problem arises when there is an element after a predicate, like in the second test.

We should definitely look again at the path preprocessing implementation here. I'm pretty sure the "first predicate" limitation can be avoided, but perhaps a more sound approach is needed, like auto-detection of namespaces. I stopped using namespaces and switched to local-name() checks in 2012, and I don't remember why. We also have an xpath() function that inserts namespaces into a query and we don't use it here.

As a workaround, the following should work with the current implementation:

'//SpectrumIdentificationList[@id="SIL_1572215611447534775"]/*[local-name()="SpectrumIdentificationResult"]'

@colin-combe
Copy link
Author

thanks

@colin-combe
Copy link
Author

see also #145

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants