Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preferred name from incorrect CDB file #31

Open
sandertan opened this issue Mar 24, 2021 · 5 comments
Open

Preferred name from incorrect CDB file #31

sandertan opened this issue Mar 24, 2021 · 5 comments

Comments

@sandertan
Copy link
Contributor

sandertan commented Mar 24, 2021

Hi @tomolopolis another question and/or bug: I've been cleaning up the concept table with Dutch words that we use for creating the MedCAT CDB file. I noticed that when I use a new CDB file, for a new or existing project, that the "Concept Summary" can still display the "Name"/preferred name/pretty name of the concept from a difference CDB file. I'm not experienced in Vue so I haven't been able to pinpoint the root of this issue, but I think this is a bug. Edit: The API returns the wrong pretty_name, I'll look into this tomorrow.

Example

In the last screenshot, the preferred name of "pneumothorax" contains "NAO". This is a suffix in some names from Dutch MedDRA, and not useful for entity linking, so I removed this from the concept table and generated a new CDB file. I tested it in a Jupyter notebook with MedCAT, and the issue seems resolved there:
image

Also, in this new CDB I've added a new concept, "methotrexaat" to verify MCT uses the updated CDB. (I still need to add a TUI to this concept so dont worry about that).

In MedCATtrainer, the new concept "methotrexaat" is correctly identified, so I'm sure the updated CDB is in use. But the preferred name still contains "NAO". I suspect this name is retrieved from a different CDB file in the same MedCATtrainer instance.
image

@sandertan
Copy link
Contributor Author

Seems to be caused around here:

pretty_name = ""
I'll try to debug it.

@sandertan
Copy link
Contributor Author

sandertan commented Mar 24, 2021

I think the application uses a general CUI lookup table that spans across projects, because the GET request for filling in this "Concept Summary" can only pass the CUI, not the ConceptDB ID: /api/concepts/?cui=C0032326

Do you think it would make sense to extract the pretty name from the project-specific CDB instead? In our current use case we're experimenting with different concept databases with different preferred names for CUIs, because we have to do quite some preprocessing to get a clean list of dutch concept names.

@tomolopolis
Copy link
Member

yes you're correct in thinking the concepts table has a primary key of the the concept cui from a given CDB. The original intention around this was to have potentially many projects using one CDBs worth of concepts, and therefore not forcing folks to import concepts from each CDB per project, but this has been limiting at times.

We could look to improve the concepts table somehow or the concept pretty name lookup could be improved to alternatively look within the project specific CDB.

@sandertan
Copy link
Contributor Author

Let me know if you would like me to look into adding this change. I'm not sure though if other parts of the application rely on this "one concept table" paradigm as well.

@sandertan
Copy link
Contributor Author

@tomolopolis We can close this one for now. When using different CDB universes with different pretty names, this issue can be solved by setting up multiple MedCATTrainer instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants