Skip to content

A Recommendation System for Navigating COVID-19 Research Articles with NLP and Unsupervised ML topic modeling

License

Notifications You must be signed in to change notification settings

crystal-ctrl/nlp_project

Repository files navigation

COVIPEDIA

A Recommendation System for Navigating COVID-19 Research Articles

NLP Unsupervised ML project

Goal

The goal of this project is build a recommendation system for scientists and researchers to navigate the current surge of papers about COVID-19, find what is relevant to their work, and uncover the hidden semantic relationships. Using the COVID-19 Open Research Dataset, I used the abstract of the subset of articles from January 2020 to May 2021 (about 260,000 articles) as text in this project. With the LDA model, I assigned each documents with dominant topic and their relevance to the topic and grouped articles by topics for recommendation system. So researchers can look up articles based on topic that is related to their work. Lastly, I deployed a Strealit app on Heroku with a smaller dataset that recommends top 20 related articles for the selected topic.

To learn more, see my blog post and presentation slides

The topic model visualization with pyLDAvis is saved as a html file, you can download it from here to see.

Try out the Heroku app for COVIPEDIA~

Workflow

Technologies

  • Python (pandas, numpy)
  • langdetect
  • regex, string
  • spaCy, scispaCy ("en_core_sci_lg" model for biomedical, scientific, and clinical vocabulary)
  • NLTK
  • Scikitlearn
  • Gensim
  • WordCloud
  • pyLDAvis
  • Streamlit
  • Heroku

About

A Recommendation System for Navigating COVID-19 Research Articles with NLP and Unsupervised ML topic modeling

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published