Using ML techniques to overlay CoronaWhy Spanish Flu data on COVID-19. You can find all datasets published in CoronaWhy Data Lake.
We're sharing all meetings on YouTube, please feel free to join us if you would like to contribute.
Download the latest version of the KB Spanish flu dataset
wget http://datasets.coronawhy.org/api/access/datafile/503748 -O data.tar.gz;gzip -cd data.tar.gz|tar xf -
wget http://datasets.coronawhy.org/api/access/datafile/741787 -O congress.tar.gz;gzip -cd congress.tar.gz|tar xf -
Download Language Identification Model:
wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin
Install fasttext module
pip install fasttext
Run Language Detection process
python3 ./main.py
File citations.txt with relevant fragments will be produced based on keywords defined in config.py
You can also do full-text search in the whole collection by querying Elasticsearch index spanishflu
curl "http://search.coronawhy.org/spanishflu/_search?pretty=true&q=*"