Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rerun of i2MassChroQ on Ion Level module fails #459

Closed
JuliaS92 opened this issue Nov 28, 2024 · 12 comments
Closed

Rerun of i2MassChroQ on Ion Level module fails #459

JuliaS92 opened this issue Nov 28, 2024 · 12 comments

Comments

@JuliaS92
Copy link
Contributor

Describe the bug
Downloading the input_df.csv from the Public runs and reloading that as new data raises an error.

To Reproduce
Steps to reproduce the behavior:

  1. Download input_df.csv for i2MassChroQ__20240904_071654
  2. Submit the same file as i2MassChroQ software result
  3. Hit parse and bench

Expected behavior
This should reproduce the results generated from the original input made to create the public run.

Screenshots

File "/mnt/data/git/ProteoBench/webinterface/pages/base_pages/quant.py", line 469, in execute_proteobench
    result_performance, all_datapoints, input_df = self.run_benchmarking_process()
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/data/git/ProteoBench/webinterface/pages/base_pages/quant.py", line 495, in run_benchmarking_process
    return self.ionmodule.benchmarking(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/proteobench/modules/dda_quant_ion/dda_quant_ion_module.py", line 90, in benchmarking
    raise ParseSettingsError(f"Error parsing the input file: {e}")
ParseSettingsError: Error parsing the input file: 'ProForma'

Desktop (please complete the following information):

  • OS: OSX Sonoma
  • Browser Firefox
  • ProteoBench version 0.5.1
@RobbinBouwmeester
Copy link
Contributor

Hi Julia,

Not entirely sure, but is the input_df.csv direct output of i2MassChroQ? There are multiple files you should be able to retrieve via the download, not all of them are direct outputs of the tool. Some of the files are formatted intermediates by ProteoBench.

@JuliaS92
Copy link
Contributor Author

It's either that or the params.csv or result_performance.csv. The direct input is not available, at least through the interface. If it is an intermediate format, we should make sure it loads the same way as the original input, especially for testing and rerunning of benchmarks.

@RobbinBouwmeester
Copy link
Contributor

Unfortunately it does not load intermediate files, and I do not think we should support that via de webinterface. We should however support downloading of the raw input files. @julianu is this currently not possible?

@RobbinBouwmeester
Copy link
Contributor

This relates to #458?

@julianu
Copy link
Contributor

julianu commented Nov 28, 2024

All data, that is stored on the server, can be downloaded via:
https://proteobench.cubimed.rub.de/datasets/ (maybe someone should put this into the docs?)

For the DDA modules, also the download function works, as far as I see.
I am not entirely sure whether the "df_input.csv" is the "raw", I just link everything that is stored right now.

Edit:
DIA has a bug right now... I will fix this.

@JuliaS92
Copy link
Contributor Author

Regarding putting it in the documentation also see the other issue: #457
For rerunning all datasets we need to be able to rerun from the input_df.csv files, if those are the only ones automatically generated.

@RobbinBouwmeester
Copy link
Contributor

Regarding putting it in the documentation also see the other issue: #457 For rerunning all datasets we need to be able to rerun from the input_df.csv files, if those are the only ones automatically generated.

In my opinion it would be better to run it from the raw input. So, as mentioned before there is no need to run it from the input_df.csv. Main reason is that if we change anything in the parsing we will not be able to re-use the results.

@mlocardpaulet
Copy link
Contributor

II may be wrong but I think that "input_df" is the raw input.
And I agree, we should re-run from the raw input.
I wonder if the issue could come from a change in i2masschroq outputs? I mean, there were so many back and forth with the developer that maybe something changed between the first point that was submitted and now?
If you want @JuliaS92 we can see this together.

@RobbinBouwmeester
Copy link
Contributor

"input_df" is unfortunately not the raw input. See: https://proteobench.cubimed.rub.de/datasets/

@RobbinBouwmeester
Copy link
Contributor

Should be mostly fixed in #462

@RobbinBouwmeester
Copy link
Contributor

Only thing left is to zip files when storing.

@RobbinBouwmeester
Copy link
Contributor

Is done and released in v0.5.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants