Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modularize / de-hardcode parser and partly record rendering: CH$LINK, various subtags, adducts #303

Open
meowcat opened this issue Apr 15, 2021 · 12 comments

Comments

@meowcat
Copy link
Contributor

meowcat commented Apr 15, 2021

To make CH$LINK more easily customizable and more modular, the link types and corresponding format could be defined in an external file e.g. a YAML of this style (I don't know if that's the best solution):

ch_link:
	CAYMAN: '<a href="https://www.caymanchem.com/product/${link}" target="_blank">${link}</a>'
	CHEBI: '<a href=\"https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:${link}" target="_blank">${link}</a>"
	INCHIKEY: '<a href=\"https://www.google.ch/search?q=&quot;${link}&quot;" target="_blank">${link}</a>"

The Record,java would parse the links using StringSubstitutor:
https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringSubstitutor.html

The petitparser for record parsing could also be built modularly, as I tried on my fork (for now without a config file):

meowcat@28d9b50

My egoistic reason why I would want this is that I would like to specify some additional internal links in our lab. It would be good if I can do that without hardcoding them. But I think this could be a good idea in general. Also, the CH$LINK types would then be defined in one single place instead of multiple places.

@meowcat
Copy link
Contributor Author

meowcat commented Apr 20, 2021

Also e.g. here, wouldn't it be nice to read the adducts from a definition file rather than hardcode them?

def("ion_type",
StringParser.of("[M]+*")
.or(StringParser.of("[M]++"))
.or(StringParser.of("[M]+"))
.or(StringParser.of("[M+H]+,[M-H2O+H]+"))
.or(StringParser.of("[M+H]+"))
.or(StringParser.of("[M+2H]++"))
.or(StringParser.of("[2M+H]+"))
.or(StringParser.of("[M+Li]+*"))
.or(StringParser.of("[M-H+Li]+*"))
.or(StringParser.of("[M+Na]+*"))
.or(StringParser.of("[M-H+Na]+*"))
.or(StringParser.of("[M+Na]+"))
.or(StringParser.of("[M+K]+"))
.or(StringParser.of("[M+K]+"))
.or(StringParser.of("[M-H2O+H]+,[M-2H2O+H]+"))
.or(StringParser.of("[M-H2O+H]+"))
.or(StringParser.of("[M+15]+"))
.or(StringParser.of("[M-H+Na]+"))
.or(StringParser.of("[2M+Na]+"))
.or(StringParser.of("[M+2Na-H]+"))
.or(StringParser.of("[M+NH3+H]+"))

I am willing to work on modularizing the parser as a whole if this is desired.

@schymane
Copy link
Member

I agree that a definition file would make sense, new adduct types are likely to keep coming in - wasn't @michaelwitting working on a list of these?

(I assume, however, that this is not directly related to CH$LINK but rather related in a generic sense?)

@meowcat meowcat changed the title Custom CH$LINK Modularize / de-hardcode parser and partly record rendering: CH$LINK, various subtags, adducts Apr 20, 2021
@michaelwitting
Copy link
Contributor

Yes, we started some time ago to collect differant adduct and their different notations in different software tools here:

https://docs.google.com/spreadsheets/d/1r4dPw1shIEy_W2BkfgPsihinwg-Nah654VlNTn8Gxo0/edit?usp=sharing

Could be revamped...

@tsufz
Copy link
Member

tsufz commented Apr 21, 2021

Where will be the location of the definition file? I suggest as a part of the MassBank-data to follow the idea that it is user editable and independent of MassBank-sever code updates?

@sneumann and @meier-rene, what do you think about?

@meowcat
Copy link
Contributor Author

meowcat commented Apr 23, 2021

I suggest as a part of the MassBank-data

Potentially, yes. Possibly with a fallback in the server code if no file is provided in data?

As for the format, this could be YAML, JSON or simple text format.

@meowcat
Copy link
Contributor Author

meowcat commented May 3, 2021

@meier-rene
I implemented a demo of modularized parsing for the CH$LINK tag here on https://github.com/meowcat/MassBank-web/tree/local-nz. The parser uses /MassBank-data/.config/recordformat/ch_link.ini when present, or otherwise falls back to /src/main/resources/recordformat/ch_link.ini.

Note:

  • this is intended as a demo and not as a pull request - you may have a better idea of what config file format to use (JSON? YAML? TOML?), where to store it, if we should have a common file for multiple definitions in the record or separate files. Particularly you might want to write nicer Java than I do.
  • this doesn't yet address modularizing the display of links in Record.java. For all except PUBCHEM, this is easy to solve with StringSubstitutor and a template file map. For PUBCHEM, I am not sure I have a good idea without hardcoding it.

https://github.com/meowcat/MassBank-web/blob/6c6348601c997601f27d1d29f2afac56d5aae965/MassBank-Project/MassBank-lib/src/main/java/massbank/RecordParserDefinition.java#L61-L103

https://github.com/meowcat/MassBank-web/blob/6c6348601c997601f27d1d29f2afac56d5aae965/MassBank-Project/MassBank-lib/src/main/java/massbank/RecordParserDefinition.java#L920-L932

https://github.com/meowcat/MassBank-web/blob/local-nz/MassBank-Project/MassBank-lib/src/main/resources/recordformat/ch_link.ini

https://github.com/meowcat/MassBank-web/blob/6c6348601c997601f27d1d29f2afac56d5aae965/MassBank-Project/MassBank-lib/src/main/java/massbank/cli/RefreshDatabase.java#L84

@meowcat
Copy link
Contributor Author

meowcat commented Nov 10, 2021

Hi,

I have modularized / de-hardcoded CH$LINK and LICENSE tags, which can be read from the data directory and have defaults in resources.
https://github.com/meowcat/MassBank-web/tree/local-nz/MassBank-Project/MassBank-lib/src/main/resources/recordformat
(and corresponding java files)

Would there be any chance of these changes (or another, potentially better implementation) being upstreamed? As of now, I have to merge this manually whenever anything changes in the main branch.

@meier-rene
Copy link
Contributor

Of course I can integrate it. A pull request would have been best, but I can also grab the required files from your repo.
So please help me to sum up the required changes.
You have resource files which get loaded at program start and some changes in the parser definition? Anything else?

@meowcat
Copy link
Contributor Author

meowcat commented Nov 10, 2021

I can make a PR - I initially intended this to be a proof of concept to be followed up by "something better" but I guess it's not so bad.

The resource file gets loaded upon first use:
https://github.com/meowcat/MassBank-web/blob/6c6348601c997601f27d1d29f2afac56d5aae965/MassBank-Project/MassBank-lib/src/main/java/massbank/RecordParserDefinition.java#L61-L103
The subtag parser is generated from the array read from the resource file:
https://github.com/meowcat/MassBank-web/blob/6c6348601c997601f27d1d29f2afac56d5aae965/MassBank-Project/MassBank-lib/src/main/java/massbank/RecordParserDefinition.java#L920-L932

If you are fine with this implementation, I will put it onto the newest version and make a PR.

@meier-rene
Copy link
Contributor

meier-rene commented Nov 10, 2021

If you are ok with it, than you can skip the PR. Im nearly done with copying over your additions. Im working on this files any anyhow atm.

@meier-rene
Copy link
Contributor

meier-rene commented Nov 10, 2021

With 864137c i have integrated your suggestion for a de-hardcoded parser in the dev branch. Its only slightly modified from your version. To integrate this procedure for other tags its probably just copy and paste. The display issue is still open...

@meowcat
Copy link
Contributor Author

meowcat commented Nov 11, 2021

Thanks!

The display issue is still open...

StrSubstitutor could help maybe? I have to look at it again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants