Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorize "scientific"/"non scientific" medical documents #4

Open
asittampalam opened this issue Mar 17, 2017 · 5 comments
Open

Categorize "scientific"/"non scientific" medical documents #4

asittampalam opened this issue Mar 17, 2017 · 5 comments

Comments

@asittampalam
Copy link
Contributor

In a second step we could extract the "scientific" medical documents from our positive set.

@asittampalam asittampalam self-assigned this Mar 17, 2017
@tschimbr
Copy link
Member

In order to do this we will label top level domains as scientific / pseudo-scientific / trivial by an expert (medical doctor, coder..)

@tschimbr
Copy link
Member

Use these two sets in order to create a translation service from professional scientific texts to non scientific texts easily understandable by patients.

@asittampalam
Copy link
Contributor Author

Labeled as scientific (to be updated):
springer.com
pharmazeutische-zeitung.de
springermedizin.at
med2click.de
pathologie-online.de
clinicum.at

Labeled as non-scientific (to be updated):
netdoktor.de
diabetes-ratgeber.net
planet-wissen.de
focus.de
spektrum.de
gesundheitsinformation.de
medizin-transparent.at
haut-ratgeber.ch

@asittampalam
Copy link
Contributor Author

asittampalam commented Jul 21, 2017

Maybe we could use something like https://link.springer.com/article/10.1023%2FA%3A1007692713085?LI=true (Text Classification from Labeled and Unlabeled Documents using EM - I haven't read it yet) in order to start with a small labeled set (e.g. part of "scientific") and to use a large unlabeled set (e.g. "scientific" + "non-scientific") as leverage in order to learn a stable "scientific"/"non-scientific" classifier.

@tschimbr
Copy link
Member

tschimbr commented Apr 2, 2019

Create a data set with pairs of synonyms, one being scientific, the other being non-scientific:

  • calculate the vector embedding difference between these synonyms
  • How different are the differences?
  • average the differences
  • translate new scientific words to non-scientific words

Maybe test out in https://github.com/eonum/medword

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants