Our project applies a specific algorithm from a research paper to predict Facial Action Units (FAUs) code. Although the original paper focuses on still images, we've adapted the algorithm to predict FAUs in videos, analyzing each frame individually for face detection.
For our implementation, we've chosen to use the stage 2 of the ResNet50 architecture. It's important to note that modifying parameters may lead to errors due to the specialized nature of our implementation.
This video is sourced from the Self-Stimulatory Behavior Dataset (SSBD) [link], focusing on autism behavior. Specifically, it depicts a boy doing arm-flapping behavior.
v_ArmFlapping_01.mp4
The output video contains the facial action units (FAU) generated by the paper along with a bounding box with emotion on the face. The emotion comes from the FAU combinations in which are most significance with FAU thershold > 0.2.
demo_v_ArmFlapping_01_output.mp4
Built under python 3.12
Python | Status |
---|---|
3.9 | ❌ Fail |
3.10 | ✅ |
3.11 | ✅ |
3.11.5 | ✅ |
Please install ffmpeg on your machine to use this package. link
Download and insert both of these checkpoints to MEGraphAU > OpenGraphAU > checkpoints.
Your folder structure should look something like this.
Alternatively you may run python download_checkpoints.py
, if it does not work please proceed downloading the files manually.
# Import Libraries
import cv2
import ffmpegcv
from MEGraphAU.OpenGraphAU.predict import predict
from MEGraphAU.OpenGraphAU.utils import Image, draw_text
import json
from ultralytics import YOLO
video_path = "videos/v_ArmFlapping_01.mp4"
cap = cv2.VideoCapture(video_path)
fps = cap.get(cv2.CAP_PROP_FPS)
output_frames = []
results = {}
yolo = YOLO("yolov8n-face.pt")
# Read video frame by frame.
while(cap.isOpened()):
ret, frame = cap.read()
if ret == True:
frame_number = cap.get(cv2.CAP_PROP_POS_FRAMES)
current_time = frame_number / fps
faces = yolo.predict(frame, conf=0.40, iou=0.3)
for face in faces:
parameters = face.boxes
for box in parameters:
x1, y1, x2, y2 = box.xyxy[0]
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
h, w = y2 - y1, x2 - x1
faces = frame[y1:y1 + h, x1:x1 + w]
infostr_aus, pred = predict(Image.fromarray(faces))
res, f = draw_text(frame, list(infostr_aus), pred, ( (x1, y1), (x1+w, y1+h)))
results[current_time] = res
frame = cv2.rectangle(frame, (x1, y1), (x1+w, y1+h), (0, 0, 255), 2)
output_frames.append(frame)
if cv2.waitKey(25) & 0xFF == ord('q'):
break
else:
break
cap.release()
# OPTIONAL: Save output video
size = output_frames[0].shape
output_video = ffmpegcv.VideoWriter(f"{video_path[:-4]}_output.mp4", None, fps)
for of in output_frames:
output_video.write(of)
output_video.release()
# OPTIONAL: Save FAU results
with open(f"{video_path[:-4]}_output.json", 'w') as f:
json.dump(results, f)
if the code or method help you in the research, please cite the following paper:
@inproceedings{luo2022learning,
title = {Learning Multi-dimensional Edge Feature-based AU Relation Graph for Facial Action Unit Recognition},
author = {Luo, Cheng and Song, Siyang and Xie, Weicheng and Shen, Linlin and Gunes, Hatice},
booktitle = {Proceedings of the Thirty-First International Joint Conference on
Artificial Intelligence, {IJCAI-22}},
pages = {1239--1246},
year = {2022}
}
@article{song2022gratis,
title={Gratis: Deep learning graph representation with task-specific topology and multi-dimensional edge features},
author={Song, Siyang and Song, Yuxin and Luo, Cheng and Song, Zhiyuan and Kuzucu, Selim and Jia, Xi and Guo, Zhijiang and Xie, Weicheng and Shen, Linlin and Gunes, Hatice},
journal={arXiv preprint arXiv:2211.12482},
year={2022}
}