Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Conversion #95

Open
ozgurdemir opened this issue Dec 18, 2020 · 3 comments
Open

Slow Conversion #95

ozgurdemir opened this issue Dec 18, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@ozgurdemir
Copy link

Is there a possibility to speed up the conversion process? The conversion of a file with ~200 scans takes around 20 seconds:

command used:
ThermoRawFileParser.exe -i test.RAW -b test.mzML --noPeakPicking -f 1

real 1m2.862s user 0m21.265s sys 0m3.149s

resulting mzML file size 52mb

Thanks for this application btw.

@caetera caetera added the enhancement New feature or request label Dec 18, 2020
@caetera
Copy link
Collaborator

caetera commented Dec 18, 2020

Hi @ozgurdemir ,
it is not very clear how to answer your question, since it is not possible to locate the performance bottleneck from your description. Is it only slow with the specific file or the performance is slow in general?

The conversion uses a single thread. An obvious call is to use multi-threading, there is an old issue (#23) about it. This functionality was not implemented and I cannot provide you with any estimation if/when it will. Although reading and converting individual spectra can be parallelized relatively easy, the final assembly of mzML file (especially the indexed one) is much more difficult to implement in parallel and this step will determine the performance (that is one of the reasons why multi-threaded processing is not there). There can be, of course, some other performance issues, that being solved will improve the overall performance even in a single-threaded design.

One can run multiple copies of ThermoRawFileParser in parallel to utilize the resources of the computer better. Of course, it only will work when converting several raw files.

@ozgurdemir
Copy link
Author

Hi @caetera , thx for the rapid response. I'm note seeing this for a particular file. So you're right this is probably more of a question rather than in issue. I was just wondering why the conversion from one format to another takes so much time. Without knowing anything about the process itself. Maybe there are some calculations, compression etc. involved.

I agree. Parallelizing code is always tricky. Plus if there are bottlenecks in the single threaded implementation they will still be present in the multi core implementation. I'm not familiar with c# tooling but did you ever profile the conversion process to detect bottlenecks?

@nielshulstaert
Copy link
Contributor

Hi I agree with @caetera, going multithreaded could make it faster but will be a challenge. I'm sure improvements could be made even without it, suggestions are always welcome (it was my first C# project). Could you share the RAW file? I'll see what I can do with profiling. Also, using the --noPeakPicking flag increases the file size size as you are probably aware. Thanks for using the parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants