An assistive writing tool to analyze linguistic and cultural variation across communities
Please run the following:
conda create -n message python=3.8
pip install -r requirements.txt
You can follow the instructions from the public BLM Twitter dataset to download tweets using our filtered tweetid to generate a smaller dataset which contains ~200K pro-BLM tweets and ~100K anti-BLM tweets. The preprocessing code and data are here. After that, move the dataset to ./data/blm_alm/raw/
such that you have the following two files: pro_blm_200k.txt
and anti_blm_100k.txt
.
cd semantic_shift
# download BERTweet to your local machine
python download_bertweet.py
sh ./bash_scripts/compute_semantic_shifts.sh
Check the notebook to see the analysis.
cd ideology-alignment
sh train_script.sh
Check the notebook to see the analysis.
This github is developed on the basis of UiO-UvA at SemEval-2020 Task 1 and Aligning Multidimensional Worldviews and Discovering Ideological Differences.