CDQA-project

While it says ori_pqal.json in the data folder, it is actually the cleaned version of the file. It's just given that name to be compatible with other code

Commands

To change which model is use, please make sure to change the --model_name and --dte_lookup_table_fp args. All of the various KGE parms should be able to be used together.

Note: the --gpus expects consecutive GPU ID's that always start with 0. These ID's are considered wrt the visible devices, as defined by a export CUDA_VISIBLE_DEVICES=... command. Even if you want to use, say, GPU 0 and 1, I believe it is preferred to explicitly do export CUDA_VISIBLE_DEVICES=0,1. So, to use GPUs not starting w/ the 0 ID, such as GPUs 2 and 3, use the command export CUDA_VISIBLE_DEVICES=2,3 and run_modeling.py arg of gpus 0 1. An example in a .sh script is given below.

#!/bin/bash

export CUDA_VISIBLE_DEVICES=2,3
python run_modeling.py --gpus 0 1

Note: to have the script run in the background and output to a file, use a bash script akin to the below:

#!/bin/bash

export CUDA_VISIBLE_DEVICES=2,3
nohup python run_modeling.py --gpus 0 1 > "run_model.out" &

The output will be written to the file whenever the pipe is full (I think? lol). A convenient way to frequently check the file is watch -n 5 "cat run_model.out | tail -n 30".

Note: If you are training 2+ models at the same time, make sure that no two training scripts are given the same --port arg (doesn't apply to baseline models).

BERT Baseline

python covid_qa_baseline.py --model_name "phiyodr/bert-base-finetuned-squad2" \
                            --dte_lookup_table_fp "DTE_to_phiyodr_bert-base-finetuned-squad2.pkl" \
                            --max_len 384 \
                            --n_stride 196

Vanilla BERT Fine-tuning

export CUDA_VISIBLE_DEVICES=0,1
python run_modeling.py --batch_size 40 \
                       --model_name "phiyodr/bert-base-finetuned-squad2" \
                       --dte_lookup_table_fp "DTE_to_phiyodr_bert-base-finetuned-squad2.pkl" \
                       --lr 3e-5 \
                       --n_epochs 2 \
                       --max_len 384 \
                       --n_stride 196 \
                       --warmup_proportion 0.1 \
                       --n_neg_records 2 \
                       --gpus 0 1 \
                       --seed 16 \
                       --port 42069

BERT+KGE-Replace Fine-tuning

export CUDA_VISIBLE_DEVICES=0,1
python run_modeling.py --batch_size 40 \
                       --model_name "phiyodr/bert-base-finetuned-squad2" \
                       --dte_lookup_table_fp "DTE_to_phiyodr_bert-base-finetuned-squad2.pkl" \
                       --lr 3e-5 \
                       --n_epochs 3 \
                       --max_len 384 \
                       --n_stride 196 \
                       --warmup_proportion 0.1 \
                       --use_kge T \
                       --n_neg_records 3 \
                       --gpus 0 1 \
                       --seed 16 \
                       --port 42069

BERT+Random-KGE-Replace Fine-tuning

export CUDA_VISIBLE_DEVICES=0,1
python run_modeling.py --batch_size 40 \
                       --model_name "phiyodr/bert-base-finetuned-squad2" \
                       --dte_lookup_table_fp "DTE_to_phiyodr_bert-base-finetuned-squad2.pkl" \
                       --lr 3e-5 \
                       --n_epochs 2 \
                       --max_len 384 \
                       --n_stride 196 \
                       --warmup_proportion 0.1 \
                       --use_kge T \
                       --random_kge T \
                       --n_neg_records 2 \
                       --gpus 0 1 \
                       --seed 16 \
                       --port 42069

BERT+KGE-Concat Fine-tuning

export CUDA_VISIBLE_DEVICES=0,1
python run_modeling.py --batch_size 40 \
                       --model_name "phiyodr/bert-base-finetuned-squad2" \
                       --dte_lookup_table_fp "DTE_to_phiyodr_bert-base-finetuned-squad2.pkl" \
                       --lr 3e-5 \
                       --n_epochs 3 \
                       --max_len 384 \
                       --n_stride 196 \
                       --warmup_proportion 0.1 \
                       --use_kge T \
                       --concat_kge T \
                       --n_neg_records 5 \
                       --gpus 0 1 \
                       --seed 16 \
                       --port 42069

Name		Name	Last commit message	Last commit date
Latest commit History 371 Commits
Homogenization_Programs		Homogenization_Programs
Mikolov(E-BERT) approach		Mikolov(E-BERT) approach
Our Paper		Our Paper
UMLS_KG_MT-no_prune		UMLS_KG_MT-no_prune
UMLS_KG_MT-original		UMLS_KG_MT-original
data		data
iternorm+MUSE_baseline		iternorm+MUSE_baseline
pretrained stuff		pretrained stuff
related_papers		related_papers
.gitignore		.gitignore
CUI_PC+MM_Tokenizations_gen.py		CUI_PC+MM_Tokenizations_gen.py
CUI_PC.csv		CUI_PC.csv
Cleaned_Questions.txt		Cleaned_Questions.txt
Cleaned_Questions_utf8_replaced.txt		Cleaned_Questions_utf8_replaced.txt
Custom_Input.ipynb		Custom_Input.ipynb
KG_Constructor.ipynb		KG_Constructor.ipynb
Metamap_Tokenizations.pkl		Metamap_Tokenizations.pkl
Mikolov++_to_phiyodr_bert-base-finetuned-squad2.pkl		Mikolov++_to_phiyodr_bert-base-finetuned-squad2.pkl
Mikolov_to_phiyodr_bert-base-finetuned-squad2.pkl		Mikolov_to_phiyodr_bert-base-finetuned-squad2.pkl
NN-DTE-to-phiyodr-bert-base-finetuned-squad2.pkl		NN-DTE-to-phiyodr-bert-base-finetuned-squad2.pkl
README.md		README.md
Tokenization_Diff.ipynb		Tokenization_Diff.ipynb
Train_KGE.ipynb		Train_KGE.ipynb
apply_pubmed_qa.py		apply_pubmed_qa.py
aux_loss_sandbox.py		aux_loss_sandbox.py
bertandroberta.out		bertandroberta.out
covid_qa_baseline.py		covid_qa_baseline.py
custom_input.py		custom_input.py
custom_qa_pipeline.py		custom_qa_pipeline.py
datasets.py		datasets.py
distributed_fold_trainer.py		distributed_fold_trainer.py
farm_evaluation.py		farm_evaluation.py
farm_xval.py		farm_xval.py
input_maker.py		input_maker.py
inspect_embedding_spaces.py		inspect_embedding_spaces.py
my-project-scores-4d6578d9cb33.json		my-project-scores-4d6578d9cb33.json
on3.sh		on3.sh
on7.sh		on7.sh
original_questions.txt		original_questions.txt
pubtests.out		pubtests.out
question_text_inspection.json		question_text_inspection.json
roberta.out		roberta.out
run_everything.sh		run_everything.sh
run_modeling.py		run_modeling.py
run_pubmed_qa.py		run_pubmed_qa.py
run_roberta_kge.sh		run_roberta_kge.sh
run_roberta_vanilla.sh		run_roberta_vanilla.sh
runners.py		runners.py
runpub_all.sh		runpub_all.sh
sample.json		sample.json
split_dataset.py		split_dataset.py
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CDQA-project

While it says ori_pqal.json in the data folder, it is actually the cleaned version of the file. It's just given that name to be compatible with other code

Commands

BERT Baseline

Vanilla BERT Fine-tuning

BERT+KGE-Replace Fine-tuning

BERT+Random-KGE-Replace Fine-tuning

BERT+KGE-Concat Fine-tuning

About

Releases

Packages

Contributors 2

Languages

saptarshi059/CDQA-project

Folders and files

Latest commit

History

Repository files navigation

CDQA-project

While it says ori_pqal.json in the data folder, it is actually the cleaned version of the file. It's just given that name to be compatible with other code

Commands

BERT Baseline

Vanilla BERT Fine-tuning

BERT+KGE-Replace Fine-tuning

BERT+Random-KGE-Replace Fine-tuning

BERT+KGE-Concat Fine-tuning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages