Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation issues for xiFDR-CrossLinkExample.mzid #8

Open
edeutsch opened this issue Jun 9, 2016 · 10 comments
Open

Validation issues for xiFDR-CrossLinkExample.mzid #8

edeutsch opened this issue Jun 9, 2016 · 10 comments

Comments

@edeutsch
Copy link

edeutsch commented Jun 9, 2016

ERROR: cvParam anchor protein should have a value, but it does not!
ERROR: cvParam protein-pair-level global FDR has a value, but it should not!
ERROR: cvParam residue-pair-level global FDR has a value, but it should not!
WARNING: CV term MS:1002675 ('residue-pair-level global FDR') is not in the cv
WARNING: CV term MS:1002676 ('protein-pair-level global FDR') is not in the cv
WARNING: MS:1000563 should be 'Thermo RAW format' instead of 'Thermo Raw file'
WARNING: MS:1002404 should be 'count of identified proteins' instead of 'count of identified protein'
WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'Cross-linked spectrum identification item.'
WARNING: MS:1002544 should be 'xi' instead of 'xiFDR'
WARNING: MS:1002545 should be 'xi:score' instead of 'The xi result 'Score'.'

@edeutsch
Copy link
Author

Validation errors found in today's version:
WARNING: MS:1000563 should be 'Thermo RAW format' instead of 'Thermo Raw file'
WARNING: MS:1002404 should be 'count of identified proteins' instead of 'count of identified protein'
WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'Cross-linked spectrum identification item.'
WARNING: MS:1002544 should be 'xi' instead of 'xiFDR'
WARNING: MS:1002545 should be 'xi:score' instead of 'The xi result 'Score'.'
WARNING: MS:1002675 should be 'cross-linking result details' instead of 'residue-pair-level global FDR'
WARNING: XL:00001 should be 'BS3' instead of 'Xlink:BS3'
WARNING: XL:00005 should be 'BS3:d4' instead of 'Xlink:BS3:d4'
WARNING: XL:01000 should be 'BS3!Hydrolyzed' instead of 'Xlink:BS3!Hydrolyzed'
WARNING: XL:01001 should be 'BS3!Amidated' instead of 'Xlink:BS3!Amidated'
WARNING: XL:01008 should be 'BS3:d4!Hydrolyzed' instead of 'Xlink:BS3:d4!Hydrolyzed'
WARNING: XL:01009 should be 'BS3:d4!Amidated' instead of 'Xlink:BS3:d4!Amidated'

@edeutsch
Copy link
Author

edeutsch commented Jul 5, 2016

After changes to the XLMOD CV, here is a revised list of CV issues with this file:

INFO: Validating file 'xiFDR-CrossLinkExample.mzid'
ERROR: cvParam anchor protein should have a value, but it does not!
ERROR: cvParam residue-pair-level global FDR has a value, but it should not!
WARNING: MS:1000563 should be 'Thermo RAW format' instead of 'Thermo Raw file'
WARNING: MS:1002404 should be 'count of identified proteins' instead of 'count of identified protein'
WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'Cross-linked spectrum identification item.'
WARNING: MS:1002544 should be 'xi' instead of 'xiFDR'
WARNING: MS:1002545 should be 'xi:score' instead of 'The xi result 'Score'.'
WARNING: MS:1002675 should be 'cross-linking result details' instead of 'residue-pair-level global FDR'
WARNING: XL:00001 should be 'cross-linking entity' instead of 'Xlink:BS3'
WARNING: XL:00005 should be 'homofunctional cross-linker' instead of 'Xlink:BS3:d4'
WARNING: XL:01000 should be 'hydrolyzed BS3' instead of 'Xlink:BS3!Hydrolyzed'
WARNING: XL:01001 should be 'amidated BS3' instead of 'Xlink:BS3!Amidated'
WARNING: XL:01008 should be 'hydrolyzed BS3-d4' instead of 'Xlink:BS3:d4!Hydrolyzed'
WARNING: XL:01009 should be 'amidated BS3-d4' instead of 'Xlink:BS3:d4!Amidated'

@germa
Copy link

germa commented Jul 15, 2016

Should we allow also 'loop links', i.e. a cross-linking between the same peptide?
Lutz has some of them in his example files, but Figure 4 in the spec doc states:

In mzIdentML, they will be represented by different ProteinDetectionHypothesis(PDH) elements within different ProteinAmbiguityGroup(PAG) elements, sharing the same ID and score.

@andrewrobertjones
Copy link

@lutzfischer Can you update your xiFDR-CrossLinkExample.mzid so the CV term IDs and term names are correct

@andrewrobertjones
Copy link

@germa - Loop links can be represented on peptides, and I think Lutz has some examples of these. At the protein-level, these could be represented as associations between different protein chains, if that is what the evidence supports. Others, correct me if I'm wrong

@edeutsch
Copy link
Author

Latest validation run still shows all the above issues.

@lutzfischer
Copy link

After update of the examples and the latest update of the validator (v1.4.23) the file seem to be ok now.

Only 3 Info messages are left:

Message 1:
    Rule ID: SpectrumIdentificationList_may_rule
    Level: INFO
    Context(/cvParam/@accession ) in 2 locations
    --> None of the given CvTerms were found at '/MzIdentML/DataCollection/AnalysisData/SpectrumIdentificationList/cvParam/@accession' because no values were found:
  - Any children term of MS:1001184 (search statistics). The term can be repeated. The matching value has to be the identifier of the term, not its name.


Message 2:
    Rule ID: SearchDatabase_rule
    Level: INFO
    Context(/searchDatabase/cvParam/@accession ) in 2 locations
    --> None of the given CvTerms were found at '/MzIdentML/DataCollection/Inputs/searchDatabase/cvParam/@accession' because no values were found:
  - Any children term of MS:1001011 (search database details). The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1000561 (data file checksum type). The term can be repeated. The matching value has to be the identifier of the term, not its name.


Message 3:
    Rule ID: SearchDatabaseDatabaseName_rule
    Level: INFO
    Context(/searchDatabase/databaseName/cvParam/@accession ) in 2 locations
    --> None of the given CvTerms were found at '/MzIdentML/DataCollection/Inputs/searchDatabase/databaseName/cvParam/@accession' because no values were found:
  - Any children term of MS:1001013 (database name). The term can be repeated. The matching value has to be the identifier of the term, not its name.

I assume that this is acceptable for now and I am closing the issue

@edeutsch
Copy link
Author

The following CV problems persist in this file:

ERROR: cvParam anchor protein should have a value, but it does not!
WARNING: MS:1000563 should be 'Thermo RAW format' instead of 'Thermo Raw file'
WARNING: MS:1002404 should be 'count of identified proteins' instead of 'count of identified protein'
WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'Cross-linked spectrum identification item.'
WARNING: MS:1002544 should be 'xi' instead of 'xiFDR'
WARNING: MS:1002545 should be 'xi:score' instead of 'The xi result 'Score'.'

@edeutsch edeutsch reopened this Jul 27, 2016
@lutzfischer
Copy link

should be fixed now

@edeutsch
Copy link
Author

edeutsch commented Aug 5, 2016

This one is still there..

Validating for conflicts with CV in file xiFDR-CrossLinkExample.mzid
WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'cross-linked spectrum identification item'

@edeutsch edeutsch reopened this Aug 5, 2016
@ypriverol ypriverol transferred this issue from HUPO-PSI/mzIdentML Oct 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants