-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preferred name from incorrect CDB file #31
Comments
Seems to be caused around here: MedCATtrainer/webapp/api/api/utils.py Line 60 in a6745a7
|
I think the application uses a general CUI lookup table that spans across projects, because the GET request for filling in this "Concept Summary" can only pass the CUI, not the ConceptDB ID: Do you think it would make sense to extract the pretty name from the project-specific CDB instead? In our current use case we're experimenting with different concept databases with different preferred names for CUIs, because we have to do quite some preprocessing to get a clean list of dutch concept names. |
yes you're correct in thinking the concepts table has a primary key of the the concept cui from a given CDB. The original intention around this was to have potentially many projects using one CDBs worth of concepts, and therefore not forcing folks to import concepts from each CDB per project, but this has been limiting at times. We could look to improve the concepts table somehow or the concept pretty name lookup could be improved to alternatively look within the project specific CDB. |
Let me know if you would like me to look into adding this change. I'm not sure though if other parts of the application rely on this "one concept table" paradigm as well. |
@tomolopolis We can close this one for now. When using different CDB universes with different pretty names, this issue can be solved by setting up multiple MedCATTrainer instances. |
Hi @tomolopolis another question and/or bug: I've been cleaning up the concept table with Dutch words that we use for creating the MedCAT CDB file. I noticed that when I use a new CDB file, for a new or existing project, that the "Concept Summary" can still display the "Name"/preferred name/pretty name of the concept from a difference CDB file.
I'm not experienced in Vue so I haven't been able to pinpoint the root of this issue, but I think this is a bug.Edit: The API returns the wrong pretty_name, I'll look into this tomorrow.Example
In the last screenshot, the preferred name of "pneumothorax" contains "NAO". This is a suffix in some names from Dutch MedDRA, and not useful for entity linking, so I removed this from the concept table and generated a new CDB file. I tested it in a Jupyter notebook with MedCAT, and the issue seems resolved there:
Also, in this new CDB I've added a new concept, "methotrexaat" to verify MCT uses the updated CDB. (I still need to add a TUI to this concept so dont worry about that).
In MedCATtrainer, the new concept "methotrexaat" is correctly identified, so I'm sure the updated CDB is in use. But the preferred name still contains "NAO". I suspect this name is retrieved from a different CDB file in the same MedCATtrainer instance.
The text was updated successfully, but these errors were encountered: