Runs on Linux and macOS using Python 3.9.5
-
CaNarEx environment
cd CaNarEx python3 -m venv venv_canarex source venv_canarex/bin/activate pip install -r requirements.txt
- Use CaNarEx environment
- Run split_sentences_trf.py (data already provided)
python 1.split_sentences_trf.py
Using SpanBERT
- Download https://github.com/mandarjoshi90/coref and follow installation instructions from "Jonathan K. Kummerfeld's notebook" ('spanbert_base') into
coref_env
environment - Install following packages into
coref_env
:pip install tokenization pip install sacremoses
- Run coreference resolution
python python 2.coref_bert.py
- Use CaNarEx environment
- Run run_canarex.py
python 3.run_canarex.py
python 4.clustering.py
The evaluation folder contains generation of synthetic test data for narrative time-series clustering using jupyter notebook.
- Environment: Follow setup steps from relatio: https://github.com/relatio-nlp/relatio
- Relatio folder provided: changed to add document ids to output generated.
python 5.run_relatio.py
-
SpanBERT: Improving pre-training by representing and predicting spans.
-
Simple BERT Models for Relation Extraction and Semantic Role Labeling
-
Making monolingual sentence embeddings multilingual using knowledge distillation.
-
Fast interpolation- 184 based t-SNE for improved visualization of single-cell 185 RNA-seq data.