-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
capitalization of ontology terms in drilldown menu #111
Comments
I think we should start thinking about standardizing on a particular capitalization system and then actually modifying all the pages to match one particular capitalization. Once we do that, it would actually be better to keep the various capitalizations in the drilldown so that it will be easier to detect pages with non-standard capitalization. We can easily modify all the pages to use the standard capitalization. I assume the correct one is the one used in the Glossary page title. Please let me know what you think. |
Sounds good to me. |
This is (finally) all done! You can verify by checking that there are no duplicate values for condition, here. |
This is extremely helpful not only on the site but also for downstream analysis. |
I don't think we touched the body site fields. But I guess we should! Please verify that everything is OK with Condition and then we'll start our script for the body site field as well. |
Right, you said "for condition" - I missed that. |
It looks like there were edits that happened after our script completed its work. And the form allowed a non-standard capitalization, which we need to fix. Related, we should decide if we want to standardize on a first-letter capitalization format. Right now, some first letters are capitalized and some aren't, but there doesn't seem to be a reason for that. In standard MediaWiki it assumes the first letter should be capitalized, but we can change that. Should we switch all terms to first-letter capitalized? That doesn't always work, especially in scientific wikis where you could have a page titled something like "pH balance". But I don't notice anything in the condition or body site fields that would require first-letter lowercase. I don't think we can standardize on first-letter lowercase, because we have page titles beginning with "COVID" and "HIV". But we can manually go through them and use all lowercase for all terms, and make some exceptions as needed. Please let me know. |
I would certainly appreciate some standardization, where a first-letter capitalization format would be consistent with how we format "Location of subjects" and "Host species". A first-letter capitalization would also be applicable to "Body site" (which are currently mainly all lowercase). Indeed there are conditions like "pH measurement" that would make for exceptions, but at least they are currently not in use. |
We performed the following:
Please review. |
Can this be closed? |
Somehow when coming back to this after a while and looking at the drilldown, we seem to have opened the doors for a bit of wild wild west again in Condition and Body site: A lot of instances with the same body site / condition with different capitalization: Body site: feces/Feces; gingiva/Gingiva, lower lip/Lower lip, ... How can we prevent this from happening upon entering the information? |
Although the drilldown shows and even allows entries with different capitalization (see below, I actually tried to enter "Lung cancer"), these are not shown on the page (further below, from https://bugsigdb.org/Study_563/Experiment_2) The drill-down options seem to be a distinct issue from what is actually stored (#148). The drill-down entries may not be an issue as long as only the standard allowed value is stored. Can you clarify @tosfos? |
Just checking in on this one @tosfos? |
Also body sites are inconsistently exported in the csv file for experiments incl lower case notation for: duodenum (eg Study 682, Experiment 4) Can those all be captialized in the export? Thanks! |
Sorry I missed these replies for some reason. Will check. |
We have an updated script ready to go that fixes this issue. We are currently blocking out significant changes to production as requested, until the end of the month. Please let me know if this should be moved forward now as it will likely only have a small impact. |
If this can be introduced without interruption to the availability of the wiki, I would say let's go ahead. The manuscript including the wiki (a reviewer requested an anonymous guest account to try out the wiki) is currently under active review at Nature Biotech, and we are supposed to hear back from them during the next couple of days. We thus want to make sure that there are no interruptions that would interfere with the review process. |
Hi @tosfos - we have now received the reviews (which were very enthusiastic!) and can thus now resume with work on the wiki including the updated script that you mentioned that fixes this issue. Many thanks! |
This should be complete. We're testing it now. |
This looks good for body site. The problem seems to persist for condition. |
We can't standardize on first-letter-uppercase for Condition, but we can standardize on one capitalization for all instances of a Condition term. Will do. |
Should be consistent now. Please review. |
Looks good! Thanks! |
This problem re-appeared in the drilldown as well as in the export for Condition where we see many terms with inconsistent capitalization, eg
|
Bumping priority as this is related to the new release waldronlab/bugsigdbr#49 |
Sorry about that issue. We have a fix for it that we plan to deploy overnight. |
Before we move forward with this, I want to be sure that we're all on the same page to with handling the capitalizations for Conditions. It looks like what we were shooting for was:
See my comment above. What that means is that the drilldown will display terms like "breast cancer" but also terms like "Eczema". Note the difference in capitalization. The reason for the difference is historical. The terms are mostly just set to whatever they were in the original CSV import, and we didn't feel comfortable with automatically forcing all of the Conditions to either first-letter capitalized or first-letter lowercase. Is that a valid approach that we can use for this fix? |
This sounds like a good solution. Although I am curious whether it would be feasible to force all conditions to first-letter capitalized to be consistent with the notation for body site. I understand that this would involve some rework of the glossary pages? |
We went ahead and fixed the capitalization inconsistency issue. It will take some additional effort to switch everything to first-letter capitalized, but I don't think it will be a big deal. It probably does make sense, especially since the Drilldown filters show Capitalized conditions first. I took a quick look at the Conditions list and I don't see any that need to be first-letter lowercase. |
Sounds good, I am in favor of an all conditions first-letter capitalized solution. |
Thanks! I changed "cotrimoxazole" to "antimicrobial agent". Can we add "gonorrhea" to the glossary? (DOID:7551, link: http://purl.obolibrary.org/obo/DOID_7551) The other two terms (mother's own milk, socioeconomic status) are interesting cases. They are valid conditions but they are not defined in the Experiment Factor Ontology. Would it make sense to define these terms in the glossary based on other ontologies? @tosfos @lwaldron This would at least make sure that these terms are based on controlled vocabulary, even though none of the downstream reasoning based on the EFO would apply for these terms. We could eg use:
P.S.: Note that there are still two versions of "socioeconomic status" in the above screenshot though (one all lowercase, and one first letter capitalized) |
As there don't seem to be any objections, I suggest to go ahead with adding these three terms to the glossary as suggested above. Please let me know if you need additional information to add these terms to the glossary @tosfos. |
Sounds good to me. Would be ideal if we had an easy way to add glossary terms when needed, although it should be rare as EFO seems to handle almost everything (and we could probably have these terms added to EFO by request, so not essential). |
Requested that the two missing terms are added to the EFO: |
Sorry about the delay! Gonorrhea was added to the Glossary here. You can edit it to add a definition and aliases here. Socioeconomic status was added to the Glossary here. You can edit it to add a definition and aliases here. Maternal milk was added to the Glossary here. You can edit it to add a definition and aliases here. I also added the information from |
There is already an easy way to do this. We'll document it. |
Note that this is not a valid link right now even though the ontology say that it is. |
EDIT: This updated itself. It just needs a definition added. |
Great, thanks! I added defs, links, and synonyms to the terms.
Yeah but that is the link provided by the OLS so I guess we have to live with it ... |
Closing this issue and making the documentation of how to add new glossary terms a separate issue #217. |
I noticed that drilldown for condition shows:
The studies themselves all consistently display the correct ontology term, https://bugsigdb.org/HIV_infection (lower-case "infection"). Would it be a big job to consistently show only the main ontology term, with correct capitalization, on the drilldown search?
The text was updated successfully, but these errors were encountered: