Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mzDB output #83

Open
dominik-kopczynski opened this issue Jul 10, 2020 · 4 comments
Open

mzDB output #83

dominik-kopczynski opened this issue Jul 10, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@dominik-kopczynski
Copy link
Contributor

Hi folks,

would it be possible to add a further output format for the thermo raw file parser, namely the mzDB file format [1]? With its index strategy and database structure, it is way faster to read them than parsing mzML files.

Cheers,
Dominik

[1] https://pubmed.ncbi.nlm.nih.gov/25505153/

@dominik-kopczynski dominik-kopczynski added the enhancement New feature or request label Jul 10, 2020
@ypriverol
Copy link
Collaborator

it should be easy and is the original idea of the library, we have the parquet file export.

@caetera
Copy link
Collaborator

caetera commented Jul 10, 2020

Hi @dominik-kopczynski, I think our good friend David have some plans on mzDB and ThermoRawFileParser.

@david-bouyssie
Copy link

It should be indeed possible to implement the whole conversion logic inside the ThermoRawFileParser library.
However I decided to test a different solution for the mzDB conversion implementation, which should allow me to reuse existing Java/Scala code. If we are happy about this experiment it should also help other folks working in C++, R and so, to use the ThermoRawFileParser library. It might be useful for data visualization on Linux for instance.

I have forked the current project and performed some changes enabling the embedding:
david-bouyssie@bf1e6f3

In parallel I have forked the Embeddinator-4000 project and created some Windows Docker files to simplify the build of the fork from sources:
https://github.com/david-bouyssie/e4k-dockers

I'm using Embeddinator-4000 to generate the glue code (a C-like library wrapping the C# one and a JAR file containing the wrapper).
Now I'm working on the integration in the mzdb4s project: https://github.com/mzdb/mzdb4s
I already have a prototype which is working on Windows. The next step is to make it work on Linux, but it should not be a bigger problem.

Feedback is welcome ;)

@david-bouyssie
Copy link

david-bouyssie commented Sep 9, 2020

Here is a first pre-release including two converters (raw->mzDB and mzDB->MGF):
https://github.com/mzdb/mzdb4s/releases/download/0.2/mzdb-conversion-tools_0.2.zip

Note that the thermo2mzDB is a native executable which is targeting Linux Ubuntu. I could also deliver a Java program if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants