Source code and data (raw and preprocessed) for training multimodal approaches to predict the impression given by politicians in TV debates.
https://www.ukp.tu-darmstadt.de
Project maintainer: Pedro Santos (@pedrobisp)
This repository contains experimental software and is published for the sole purpose of giving additional background details.
The transcripts and impression scores are available by request. If you use this dataset in your work, please cite the following paper in your publication:
Nagel, F., Maurer, M., & Reinemann, C. (2012). Is there a visual dominance in political communication? How verbal, visual, and vocal communication shape viewers’ impressions of political candidates. In: Journal of Communication, 62, 833-850.
Please contact the project maintainer for obtaining these resources: https://www.informatik.tu-darmstadt.de/ukp/ukp_home/staff_ukp/detailseite_mitarbeiter_1_42304.en.jsp
Due to legal issues, we cannot redistribute the debate recording. However, the debate is publicly available online at YouTube: https://youtu.be/Hybsgj1MIZ4
We used Python 3 in our experiments, so we recommend also using it to reproduce the experiments. We also recommend using a virtual enviroment. The packages used in our experiments can be installed by the following command:
(venv) pip3 install -r requirements.txt
Besides that, we also used third party open source software for preprocessing the debate video file:
-
FFmpeg - https://www.ffmpeg.org/
-
openSMILE - https://www.audeering.com/technology/opensmile/ (If you cannot compile the latest version in Ubuntu, try this naxingyu/opensmile#2 (comment))
The debate is segmented into turns, where each turn consists of a politician talking without a major interruption by the other politician or one of the journalists. Small interruptions can occur though. So, the first step for preprocessing the debate video file is to segment it in turns. Please make sure that FFmpeg is installed and is callable by the command line. Run the following script to segment the debate video:
$(venv) python preprocessing/turns_segmentation.py resources/turns_segmentation.yaml
The YAML file "turns_segmentation.yaml" has the input the video file, the textual transcripts file, the output folder path containing the turns segmented, and the input content response measurement (crm) file containing the average impression scores together with other information manually annotated (speaker, type of argumentation being used, and so on). Some turns will have less than 3 seconds, and should not be used. Please run the following script to delete these small turns:
$(venv) python preprocessing/delete_small_turns.py resources/delete_small_turns.yaml
To extract the Mel-Frequency Cepstral Coefficients, run the following script:
$(venv) python preprocessing/extract_mfcc.py resources/mfcc.yaml
The opensmile configuration file is in the resources folder. Please use this configuration file. The input folder should be the output folder generated by the turns_segmentation.py script. Do not forget to specify the path to the openSMILE binary file in the YAML file.
To extract the visual features, run the folloowing script:
$(venv) python preprocessing/extract_visual_features.py resources/visualfeatures.yaml
In the YAML file, you should specify the local path of the FeatureExtraction binary file from OpenFace. The input folder is also the output folder generated by the turns_segmentation.py script.
We used fastText pretrained word embeddings: https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md For aligning the modalities, run the following script:
$(venv) python preprocessing/multimodal_io.py resources/multimodal_io.yaml
Please specify in the YAML file the path for the pretrained word embeddings, the oov list, the folder path containing the segmented turns and the path for the output pickle file containing the aligned modalities.
To reproduce the results reported in the paper, you have to create the yaml files for each fold and modality combination. Run the following script to do so:
$(venv) python preprocessing/create_yaml_files.py resources/${modality}.yaml
The following yaml files are available in the resources folder: text.yaml, speech.yaml, vision.yaml, text_speech.yaml, text_vision.yaml, speech_vision.yaml, all_modalities.yaml.
After creating the yaml files for each fold, run the following script for perfoming leave-one-turn-out cross-validation:
$(venv) sh run_scripts.sh $modality_folder