You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On a development version of NameRes, I tried searching for COVID with MONDO|HP filtering turned on, Biolink type filtering to Disease and sorting by shortest_name_length. I got back the following results in this order:
MONDO:0100233 "long COVID-19"
MONDO:0100163 "COVID-19–associated multisystem inflammatory syndrome in children"
MONDO:0100319 "COVID-19–associated multisystem inflammatory syndrome in adults"
MONDO:0100096 "COVID-19"
This is because MONDO:0100233 ("PASC") and MONDO:0100163 ("MISC", "PIMS", "PMIS") have shorter synonyms than MONDO:0100319 ("MIS-A") and MONDO:0100096 ("β-CoV"), and since both of the latter have the same length of synonym, we don't have any way of separating them.
This isn't too bad, since COVID-19 is still in the first five results, but it's not ideal.
There are other stats we could measure to help improve this situation:
preferred_name_length: The length of the preferred name
information_content: The information content of the clique (this will be missing for lots of identifiers: we could sort them after every concept we have an information concept value for)
Probably the next step will be to include all three of these stats in the Solr database, and then we can make a Vue application to compare different sorting strategies until we find the one we like best.
We can also implement more complex metrics if we need to by adding them to Babel and summarizing them into a score field that we can sort on.
The text was updated successfully, but these errors were encountered:
We've made lots of improvements to how we sort our results, so I think we can close this issue and focus on the individual issues that point to issues that still need improvement (e.g. #161). But please reopen if I've missed anything!
On a development version of NameRes, I tried searching for COVID with MONDO|HP filtering turned on, Biolink type filtering to Disease and sorting by shortest_name_length. I got back the following results in this order:
This is because MONDO:0100233 ("PASC") and MONDO:0100163 ("MISC", "PIMS", "PMIS") have shorter synonyms than MONDO:0100319 ("MIS-A") and MONDO:0100096 ("β-CoV"), and since both of the latter have the same length of synonym, we don't have any way of separating them.
This isn't too bad, since COVID-19 is still in the first five results, but it's not ideal.
There are other stats we could measure to help improve this situation:
Options for a search order:
Probably the next step will be to include all three of these stats in the Solr database, and then we can make a Vue application to compare different sorting strategies until we find the one we like best.
We can also implement more complex metrics if we need to by adding them to Babel and summarizing them into a
score
field that we can sort on.The text was updated successfully, but these errors were encountered: