Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capitalization of ontology terms in drilldown menu #111

Closed
lwaldron opened this issue Jan 17, 2022 · 42 comments
Closed

capitalization of ontology terms in drilldown menu #111

lwaldron opened this issue Jan 17, 2022 · 42 comments
Assignees
Labels
priority necessary for early utility

Comments

@lwaldron
Copy link
Member

I noticed that drilldown for condition shows:

The studies themselves all consistently display the correct ontology term, https://bugsigdb.org/HIV_infection (lower-case "infection"). Would it be a big job to consistently show only the main ontology term, with correct capitalization, on the drilldown search?

@tosfos
Copy link
Collaborator

tosfos commented Jan 28, 2022

I think we should start thinking about standardizing on a particular capitalization system and then actually modifying all the pages to match one particular capitalization. Once we do that, it would actually be better to keep the various capitalizations in the drilldown so that it will be easier to detect pages with non-standard capitalization.

We can easily modify all the pages to use the standard capitalization. I assume the correct one is the one used in the Glossary page title. Please let me know what you think.

@lwaldron
Copy link
Member Author

Sounds good to me.

@tosfos
Copy link
Collaborator

tosfos commented Apr 19, 2022

This is (finally) all done! You can verify by checking that there are no duplicate values for condition, here.

@lgeistlinger
Copy link
Collaborator

This is extremely helpful not only on the site but also for downstream analysis.
It seems, however, that we still have some duplicates as we eg have under body site: "meconium" and "Meconium", "throat" and "Throat", and "vagina" and "Vagina".

@tosfos
Copy link
Collaborator

tosfos commented Apr 19, 2022

I don't think we touched the body site fields. But I guess we should! Please verify that everything is OK with Condition and then we'll start our script for the body site field as well.

@lgeistlinger
Copy link
Collaborator

lgeistlinger commented Apr 19, 2022

Right, you said "for condition" - I missed that.
For condition, I see
"antimicrobial agent" and "Antimicrobial agent", and
"irritable bowel syndrome" and "irritable bowl syndrome" (something to fix on our end?)

@tosfos
Copy link
Collaborator

tosfos commented Apr 21, 2022

It looks like there were edits that happened after our script completed its work. And the form allowed a non-standard capitalization, which we need to fix.

Related, we should decide if we want to standardize on a first-letter capitalization format. Right now, some first letters are capitalized and some aren't, but there doesn't seem to be a reason for that. In standard MediaWiki it assumes the first letter should be capitalized, but we can change that.

Should we switch all terms to first-letter capitalized? That doesn't always work, especially in scientific wikis where you could have a page titled something like "pH balance". But I don't notice anything in the condition or body site fields that would require first-letter lowercase.

I don't think we can standardize on first-letter lowercase, because we have page titles beginning with "COVID" and "HIV". But we can manually go through them and use all lowercase for all terms, and make some exceptions as needed. Please let me know.

@lgeistlinger
Copy link
Collaborator

lgeistlinger commented Apr 21, 2022

I would certainly appreciate some standardization, where a first-letter capitalization format would be consistent with how we format "Location of subjects" and "Host species". A first-letter capitalization would also be applicable to "Body site" (which are currently mainly all lowercase). Indeed there are conditions like "pH measurement" that would make for exceptions, but at least they are currently not in use.

@tosfos
Copy link
Collaborator

tosfos commented Jul 15, 2022

We performed the following:

  1. For all the Body sites, the Term field was made first-letter capitalized.
  2. All Experiments where Body site terms are used had their values updated to use first-letter capitalized values.
  3. All Body site Alias field values were updated to the first-letter capitalized version.
  4. All Experiments with Body site values that are linked to non-existent terms were first-letter capitalized.

Please review.

@tosfos
Copy link
Collaborator

tosfos commented Sep 2, 2022

Can this be closed?

@lgeistlinger
Copy link
Collaborator

Somehow when coming back to this after a while and looking at the drilldown, we seem to have opened the doors for a bit of wild wild west again in Condition and Body site:

A lot of instances with the same body site / condition with different capitalization:

Body site: feces/Feces; gingiva/Gingiva, lower lip/Lower lip, ...
Condition: breast cancer/Breast cancer, lung cancer/Lung cancer, ...

How can we prevent this from happening upon entering the information?

@lwaldron
Copy link
Member Author

Although the drilldown shows and even allows entries with different capitalization (see below, I actually tried to enter "Lung cancer"), these are not shown on the page (further below, from https://bugsigdb.org/Study_563/Experiment_2)

image

image

The drill-down options seem to be a distinct issue from what is actually stored (#148). The drill-down entries may not be an issue as long as only the standard allowed value is stored. Can you clarify @tosfos?

@lgeistlinger
Copy link
Collaborator

Just checking in on this one @tosfos?

@lgeistlinger
Copy link
Collaborator

Also body sites are inconsistently exported in the csv file for experiments incl lower case notation for:

duodenum (eg Study 682, Experiment 4)
feces (eg Study 323, Experiment 1)
nose (eg Study 262, Experiment 1)
skin of cheek (Study 388, Experiment 1)
tongue (Study 388, Experiment 1)
gingiva (Study 388, Experiment 1)
oropharynx (Study 388, Experiment 1)
buccal mucosa (Study 322, Experiment 1)
lower lip (Study 322, Experiment 1)

Can those all be captialized in the export? Thanks!

@tosfos
Copy link
Collaborator

tosfos commented Nov 14, 2022

Sorry I missed these replies for some reason. Will check.

@tosfos
Copy link
Collaborator

tosfos commented Nov 25, 2022

We have an updated script ready to go that fixes this issue. We are currently blocking out significant changes to production as requested, until the end of the month. Please let me know if this should be moved forward now as it will likely only have a small impact.

@lgeistlinger
Copy link
Collaborator

If this can be introduced without interruption to the availability of the wiki, I would say let's go ahead. The manuscript including the wiki (a reviewer requested an anonymous guest account to try out the wiki) is currently under active review at Nature Biotech, and we are supposed to hear back from them during the next couple of days. We thus want to make sure that there are no interruptions that would interfere with the review process.

@lgeistlinger
Copy link
Collaborator

Hi @tosfos - we have now received the reviews (which were very enthusiastic!) and can thus now resume with work on the wiki including the updated script that you mentioned that fixes this issue. Many thanks!

@tosfos
Copy link
Collaborator

tosfos commented Dec 14, 2022

This should be complete. We're testing it now.

@lgeistlinger
Copy link
Collaborator

This looks good for body site. The problem seems to persist for condition.

@tosfos
Copy link
Collaborator

tosfos commented Dec 30, 2022

We can't standardize on first-letter-uppercase for Condition, but we can standardize on one capitalization for all instances of a Condition term. Will do.

@tosfos
Copy link
Collaborator

tosfos commented Jan 4, 2023

Should be consistent now. Please review.

@lgeistlinger
Copy link
Collaborator

Looks good! Thanks!

@lgeistlinger
Copy link
Collaborator

This problem re-appeared in the drilldown as well as in the export for Condition where we see many terms with inconsistent capitalization, eg

  • breast cancer / Breast cancer
  • gastric cancer / Gastric cancer
  • urinary tract infection / Urinary tract infection
  • ...

@lgeistlinger lgeistlinger reopened this Dec 26, 2023
@lgeistlinger lgeistlinger added the priority necessary for early utility label Dec 26, 2023
@lgeistlinger
Copy link
Collaborator

Bumping priority as this is related to the new release waldronlab/bugsigdbr#49

@tosfos
Copy link
Collaborator

tosfos commented Dec 27, 2023

Sorry about that issue. We have a fix for it that we plan to deploy overnight.

@tosfos
Copy link
Collaborator

tosfos commented Dec 28, 2023

Before we move forward with this, I want to be sure that we're all on the same page to with handling the capitalizations for Conditions. It looks like what we were shooting for was:

We can’t standardize on first-letter uppercase, but we should standardize on one capitalization option for each term. We should use whatever the Capitalization is stored as on the Condition glossary page.

See my comment above.

What that means is that the drilldown will display terms like "breast cancer" but also terms like "Eczema". Note the difference in capitalization. The reason for the difference is historical. The terms are mostly just set to whatever they were in the original CSV import, and we didn't feel comfortable with automatically forcing all of the Conditions to either first-letter capitalized or first-letter lowercase.

Is that a valid approach that we can use for this fix?

@lgeistlinger
Copy link
Collaborator

We can’t standardize on first-letter uppercase, but we should standardize on one capitalization option for each term. We should use whatever the Capitalization is stored as on the Condition glossary page.

This sounds like a good solution. Although I am curious whether it would be feasible to force all conditions to first-letter capitalized to be consistent with the notation for body site. I understand that this would involve some rework of the glossary pages?

@tosfos
Copy link
Collaborator

tosfos commented Jan 8, 2024

We went ahead and fixed the capitalization inconsistency issue. It will take some additional effort to switch everything to first-letter capitalized, but I don't think it will be a big deal. It probably does make sense, especially since the Drilldown filters show Capitalized conditions first.

I took a quick look at the Conditions list and I don't see any that need to be first-letter lowercase.

@lgeistlinger
Copy link
Collaborator

Sounds good, I am in favor of an all conditions first-letter capitalized solution.

@tosfos
Copy link
Collaborator

tosfos commented Jan 12, 2024

This is done now:
image
Note the highlighted terms that are still lowercase. These come from Condition values on the Experiment pages that do not match a glossary term or its alias. These conditions should be considered for addition to the glossary, or perhaps replaced with another existing value. Specifically, "gonorrhea" should probably be added as a Condition. The other three appear to not be conditions and are input errors that should be addressed manually.

@lgeistlinger
Copy link
Collaborator

lgeistlinger commented Jan 15, 2024

Thanks! I changed "cotrimoxazole" to "antimicrobial agent".

Can we add "gonorrhea" to the glossary? (DOID:7551, link: http://purl.obolibrary.org/obo/DOID_7551)

The other two terms (mother's own milk, socioeconomic status) are interesting cases. They are valid conditions but they are not defined in the Experiment Factor Ontology.

Would it make sense to define these terms in the glossary based on other ontologies? @tosfos @lwaldron

This would at least make sure that these terms are based on controlled vocabulary, even though none of the downstream reasoning based on the EFO would apply for these terms.

We could eg use:

P.S.: Note that there are still two versions of "socioeconomic status" in the above screenshot though (one all lowercase, and one first letter capitalized)

@lgeistlinger
Copy link
Collaborator

As there don't seem to be any objections, I suggest to go ahead with adding these three terms to the glossary as suggested above. Please let me know if you need additional information to add these terms to the glossary @tosfos.

@lwaldron
Copy link
Member Author

Sounds good to me. Would be ideal if we had an easy way to add glossary terms when needed, although it should be rare as EFO seems to handle almost everything (and we could probably have these terms added to EFO by request, so not essential).

@lgeistlinger
Copy link
Collaborator

Requested that the two missing terms are added to the EFO:

EBISPOT/efo#2167
EBISPOT/efo#2168

@lgeistlinger
Copy link
Collaborator

Please let me know if you need additional information to add these terms to the glossary @tosfos.

Just thought I quickly check back on this one @tosfos

@tosfos
Copy link
Collaborator

tosfos commented Jan 29, 2024

Sorry about the delay!

Gonorrhea was added to the Glossary here. You can edit it to add a definition and aliases here.

Socioeconomic status was added to the Glossary here. You can edit it to add a definition and aliases here.

Maternal milk was added to the Glossary here. You can edit it to add a definition and aliases here.

I also added the information from
EBISPOT/efo#2167
EBISPOT/efo#2168

@tosfos
Copy link
Collaborator

tosfos commented Jan 29, 2024

Would be ideal if we had an easy way to add glossary terms when needed

There is already an easy way to do this. We'll document it.

@tosfos
Copy link
Collaborator

tosfos commented Jan 29, 2024

http://purl.obolibrary.org/obo/ExO_0000114

Note that this is not a valid link right now even though the ontology say that it is.

@tosfos
Copy link
Collaborator

tosfos commented Jan 29, 2024

Right now there is no icon being shown for gonorrhea:

image

Should we add one? What should it be?

EDIT: This updated itself. It just needs a definition added.

image

@lgeistlinger
Copy link
Collaborator

Great, thanks! I added defs, links, and synonyms to the terms.

Note that this is not a valid link right now even though the ontology say that it is.

Yeah but that is the link provided by the OLS so I guess we have to live with it ...

@lgeistlinger
Copy link
Collaborator

Would be ideal if we had an easy way to add glossary terms when needed

There is already an easy way to do this. We'll document it.

Closing this issue and making the documentation of how to add new glossary terms a separate issue #217.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority necessary for early utility
Projects
None yet
Development

No branches or pull requests

3 participants