-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into parse_settings
- Loading branch information
Showing
4 changed files
with
137 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,10 @@ | ||
####### | ||
Modules | ||
####### | ||
############################# | ||
ProteoBench benchmark modules | ||
############################# | ||
|
||
.. toctree:: | ||
:caption: ProteoBench benchmark modules | ||
:glob: | ||
:maxdepth: 1 | ||
|
||
* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# Benchmark module life cycle | ||
|
||
## Proposal | ||
|
||
*Module proposals are not accepted yet. If you are interested, stay tuned.* | ||
|
||
Proposals can be started by opening a thread on GitHub Discussions, using a specific template. One of the ProteoBench maintainers will be assigned as editor. | ||
At least two reviewers, independent from both existing ProteoBench contributors and from the proposal submitters, should be contacted to review the proposal. | ||
|
||
Required information for a proposal: | ||
|
||
1. A **description of the new module**: | ||
- Which aspect of proteomics data analysis is benchmarked? | ||
- Added value compared to already existing modules? | ||
2. **Input data**: | ||
- Provide a persistent identifier for the dataset (e.g., PXD, or DOI) (If this does not exist yet, publish the data on Zenodo and provide this DOI) | ||
- Optionally, provide DOI to the dataset publication | ||
- If only a subset of the referenced dataset is used, describe which subset. | ||
- Describe why this dataset was selected. | ||
3. **Workflow output data** (data to be uploaded to ProteoBench for metric calculation) | ||
4. Specify the **minimal information needed from the workflow for metric calculation**. This can be an existing (standardized) file format, or a simple well-described CSV file. | ||
5. **Structured metadata**: Which information is needed to sufficiently describe each benchmark run (e.g., workflow parameters). | ||
6. **Metrics**: | ||
- Description of the benchmark metrics | ||
- Methodology for calculating benchmark metrics | ||
7. How can the metric for each benchmark run be shown in a **single visualization** (optionally add a mock figure) | ||
8. **External reviewers**: Optionally propose at least two reviewers (see above) | ||
9. Will you be able to work on the implementation (coding) yourself, with additional help from the ProteoBench maintainers? | ||
|
||
## Implementation | ||
|
||
*Implementation may or may not be done by the people who made the proposal.* | ||
|
||
Once fully reviewed and accepted, the editor moves the Proposal from Discussions to Issues. Based on this new issue (which can be labeled as “new benchmark module”), describing the finalized Proposal, the module can be implemented and documented in the ProteoBench codebase. Finally, a pull request (PR) can be opened. | ||
|
||
After two positive code reviews by ProteoBench maintainers, the PR can be merged. The PR MUST meet the following requirements: | ||
1. Proper documentation of the benchmarking module | ||
2. Proper documentation of the code | ||
3. All code should follow Black styling | ||
4. The latest commit of the PR should pass the continuous integration tests | ||
|
||
## Beta | ||
|
||
When the PR is merged, the new module enters a beta stage, where its code base is part of the Python package, and it is present on the web platforms. However, a prominent banner states that the module is still in “Beta”. After a minimal period of one month and approval by the initial proposers and external reviewers, the beta label can be removed. | ||
|
||
## Live | ||
|
||
The benchmark module is accessible to the community without restriction. | ||
|
||
## Archived | ||
|
||
Benchmark modules that are still valid but superseded by a better alternative. We still display the module on the web platforms and in the stable code base, but do not accept new reference runs anymore. A banner is also displayed, stating the status. | ||
|
||
## Withdrawn | ||
|
||
Benchmark modules that in hindsight proved to be flawed in any way and should no longer be used in any context. Code is removed from the Python package, and the module and its results are removed from the web platforms. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,33 @@ | ||
# Glossary | ||
|
||
We adopted the ontology proposed by [PSI-MS](https://github.com/HUPO-PSI/psi-ms-CV/blob/master/psi-ms.obo). | ||
Here are the terms specific to ProteoBench: | ||
|
||
## Benchmark module | ||
A benchmark module compares the performance of different data analysis workflows based on module-specific, predefined metrics. It provides a specific set of input files (e.g., mass spectrometry files and a sequence database) and it requires specific workflow output files (e.g., identified peptides). Based on these workflow output files, metrics are calculated as defined by the module and can be compared to previously validated benchmark runs. As each benchmark module defines specific workflow input files and metrics, it evaluates only a limited set of characteristics of the data analysis workflow. | ||
|
||
## Metric | ||
A single number resulting from an aggregated calculation of the workflow output which allows for a comparison between different benchmark runs. | ||
|
||
## Workflow | ||
A combination of data analysis tools with associated parameters that takes workflow input files (provided by a benchmark module) and generates workflow output files. Based on the workflow output files, metrics can be calculated describing the workflow performance. | ||
|
||
## Benchmark run | ||
The result of running a workflow with specific parameter values and calculating the benchmark metrics based on the workflow output. | ||
|
||
## Workflow metadata | ||
A set of parameter values (e.g., missed cleavages, mass tolerance), workflow properties (e.g., software name, software version), and workflow configuration files that include all information required to fully understand and re-execute a given workflow. This should include the workflow options, as well as a detailed description of the click sequence and/or potential supplemental parameters unique to the workflow. | ||
|
||
### Structured workflow metadata | ||
A fixed set of metadata to be provided through a form with every benchmark run that is submitted for validation. | ||
|
||
### Unstructured workflow metadata | ||
Additional metadata that is specific to a workflow and can therefore not be presented in a structured submission form and requires a free-form text field instead. The metadata does not need to be written as full text, but should be fully comprehensible. | ||
|
||
## Workflow configuration files | ||
Files that contain parameters for a workflow or for a data analysis tool within a workflow. These files can be specific to the workflow or to the data analysis tool and help to re-execute it with the same parameters (e.g., mqpar.xml). | ||
|
||
## Validated benchmark run | ||
A benchmark run accepted by the ProteoBench team to be made publicly available as part of the ProteoBench repository. For validation, the submission must include the workflow output files, structured metadata, unstructured metadata, and (if applicable) workflow configuration files. The workflow metadata must include all information needed to fully understand and re-execute the workflow; i.e., the benchmark run must be fully reproducible. | ||
|
||
|