DAMF - Moral Foundations Inference with Domain Adapting Ability

Overview

This method aims to detect the moral foundations expressed in textual data.

CUrrently, many ground truth datasets with moral annotations exists. They vary in the method of data collection, domain, topics, instructions for annotators, etc. Simply aggregating such heterogeneous datasets during training can yield models that fail to generalize well. DAMF is a data fusion framework for training on multiple heterogeneous datasets that improve performance and generalizability. The model uses domain adversarial training to align the datasets in feature space and a weighted loss function to deal with label shift.

Install

run git clone of the repo.

git clone https://github.com/fionasguo/DAMF.git

setup the package

python setup.py install

Dependencies

Tested on Python version 3.9.13

Will be automatically installed when running setup.py.

All dependencies are in requirments.txt

Use the code

Run the example code

To train (and test):

Put the data (eg. xxxxx.csv) under the folder data

Put the pretrained language model file (used to initiate DAMF) under the folder trained_models

Change the config file accordingly, see details below for the config file

Given a single csv data file with all data included, the program will automatically split it into train/val/test sets, run:

python3 train_and_test.py -m train_test -c config -i data/all_mf_data.csv -o outputs

Given a directory to separate csv files each for a train/val/test set, run:

python3 train_and_test.py -m train_test -c config -i data/mf_datasets -o outputs

To test using a trained DAMF model:

Put the data (eg. xxxxx.csv) under the folder data

Put the trained DAMF model file under the folder trained_models (e.g. trained_models/ckpt)

Change the config file accordingly

Run:

python3 train_and_test.py -m test -c config -i data/all_mf_data.csv -o outputs -t trained_models/ckpt

Command line arguments:

-m: mode, options: train, test, or train_test
-c: path to the config file
-i: input data dir, can be a single csv path including all train, val, test data, or a directory including separate csv files each for a train/val/test set
-o: output dir, eg. ./outputs
-t: test_model, an already trained DAMF model file for testing

Config file

see an example config file config

All arguments:

pretrained_dir: str, pretrained LM model for initializing tokenzier and initializing feature encoder, eg. 'trained_models/bert-base-uncased'
mf_model_dir: (optional) str, previously trained DAMF model path, can be used for testing, eg. 'trained_models/ckpt'
domain_adapt: bool, whether to use domain adversarial training
transformation: bool, whether to include a linear transformation module to facilitate domain-invariant feature generation
reconstruction: bool, whether to include a reconstruction module to keep feature encoder from overly corruption by adversarial training
semi_supervised: bool, whether to use semi-supervised training where labeled source data and unlabeled target data are both used for training
weighted_loss: whether to use weighted loss to mitigate label shift
aflite: bool, whether to perform AFLite as a pre-processing step to filter out data points that might contain spurious correlation
train_domain: one str or a list of str, the source domain, eg. 'MFTC', or ['MFTC','congress']
test_domain: one str or a list of str, the target domain, eg. ['congress']
n_mf_classes: int, can be 10 (number of moral foundation classes), 5 (don't distinguish between virtues and vices) or 2 (moral vs immoral)
lr: float, init learning rate
alpha $\alpha$, beta $\beta$: float, for lr decay function, params to update learning rate: $lr = lr_{init}/((1 +\alpha·p)^\beta)$, where $p = (curr\ epoch − num\ no\ adv)/total\ epoch$
batch_size: int
num_epoch: int
dropout_rate: float
lambda_trans: float, for the transformation layer - regularization term in loss function
lambda_rec: float, regularization for reconstruction layer
gamma $\gamma$: float, rate to update lambda_domain (regularization coef for the domain classifier loss) over epochs, where lambda_domain $= 2/(1 + e^{−\gamma·p})-1$
num_no_adv: int (>0), after number of epoch with no adversarial training, start to use domain classifier by setting lambda_domain > 0 in the loss
seed: int

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
data		data
src/DAMF		src/DAMF
trained_models		trained_models
.gitignore		.gitignore
README.md		README.md
config		config
evaluate_seeds.py		evaluate_seeds.py
hp_search.py		hp_search.py
requirements.txt		requirements.txt
setup.py		setup.py
train_and_test.py		train_and_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DAMF - Moral Foundations Inference with Domain Adapting Ability

Overview

Install

Dependencies

Use the code

Run the example code

To train (and test):

To test using a trained DAMF model:

Config file

About

Releases

Packages

Languages

fionasguo/DAMF

Folders and files

Latest commit

History

Repository files navigation

DAMF - Moral Foundations Inference with Domain Adapting Ability

Overview

Install

Dependencies

Use the code

Run the example code

To train (and test):

To test using a trained DAMF model:

Config file

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages