This repository was part of the CogSys Master's project module (PM2: Project in Machine Learning; Multimodal Dialogue in Human-Robot Interaction), which I took in the Summer Semester 2022 at the University of Potsdam, Germany.
The repository only contains the sub-tasks I was responsible for, which were to 1) preprocess textual data (TakeCV and Survey), 2.2) build verbal classifiers, which match a textual description to a pentomino piece, and 2.3) combine the results from the vision model and the language model. The accuracy scores, as a result, of two different types of classifiers (NB and LSTM) are shown.
I am a part of Group D: Language and Vision; our goal is to build a multimodal model that correctly detects a pentomino piece that a human participant describes verbally in a real-time visual scene, and sends the pick-up coordinate to the robot arm. The overview of the project is as follows:
- Corpora
- TakeCV
- Survey
- Augmented data
- Experiments
- Vision
- Fast R-CNN
- YOLO
- Grabbing Point
- Language
- Naive Bayes
- CNNs
- LSTMs
- Combining LV Models
- Vision
The project consists of four notebooks without data uploaded.