Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator - protein-level global FDR term #2

Open
vrkosk opened this issue Mar 10, 2022 · 0 comments
Open

Validator - protein-level global FDR term #2

vrkosk opened this issue Mar 10, 2022 · 0 comments

Comments

@vrkosk
Copy link

vrkosk commented Mar 10, 2022

Mascot Server 2.7 and later export protein FDR using the CV term MS:1001214:

      <Threshold>
        <cvParam accession="MS:1001214" name="protein-level global FDR" cvRef="PSI-MS" value="0.0562" />
      </Threshold>
    </ProteinDetectionProtocol>

Definition in psi-ms.obo:

id: MS:1001214
name: protein-level global FDR
def: "Estimation of the global false discovery rate of proteins." [PSI:PI]
xref: value-type:xsd\:double "The allowed value-type for this CV term."
is_a: MS:1002705 ! protein-level result list statistic

mzIdentMLValidator-1.4.35-SNAPSHOT.jar doesn't accept this. The error is:

Message 1:
    Rule ID: ProteinDetectionProtocolThreshold_must_rule
    Level: ERROR
    Context(/threshold/cvParam/@accession ) in 2 locations
    --> The result found at: /threshold/cvParam/@accession for which the values is  ''MS:1001214'' didn't match any of the 5 specified CV terms:
  - Any children term of MS:1001153 (search engine specific score). The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1001302 (search engine specific input parameter). The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - The sole term MS:1001494 (no threshold) or any of its children. The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1002572 (protein detection statistical threshold). The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1002706 (protein group-level result list statistic). The term can be repeated. The matching value has to be the identifier of the term, not its name.

The mzIdentML 1.1 specification doesn't mention protein FDR. Spec 1.2 mentions protein FDR in Threshold element in section 6.83, where one of the few examples is "MUST supply term MS:1001447 (prot:FDR threshold) only once". However, MS:1001447 is a child of MS:1002485 "protein-level statistical threshold", so maybe the validator would reject that too, and it's not appropriate for Mascot. Mascot doesn't apply a protein FDR threshold, it just reports what the FDR is.

Neither spec says anything about MS:1002705 "protein-level result list statistic". It's not clear to me if there is another place where MS:1001214 could be reported if it isn't intended to be under ProteinDetectionProtocol/Threshold.

@ypriverol ypriverol transferred this issue from HUPO-PSI/mzIdentML Oct 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant