This package provides the PyTorch implementation and the vision-based tactile sensor simulator for our AAAI 2021 paper. The tactile simulator is based on PyBullet and provides the simulation of the Semi-transparent Tactile Sensor (STS).
The recommended way is to install the package and all its dependencies in a virtual environment using:
git clone https://github.com/SAIC-MONTREAL/multimodal-dynamics.git
cd multimodal-dynamics
pip install -e .
The sub-package tact_sim provides the components required for visoutactile simulation of the STS sensor and is implemented in PyBullet. The simulation is vision based and is not meant to be physically accurate of the contacts and soft body dynamics.
To run an example script of an object falling on the sensor use:
python tact_sim/examples/demo.py --show_image --object winebottle
This loads the object from the graphics/objects and renders the resulting visual and tactile images.
The example scripts following the name format experiments/exp_{ID}_{task}.py
have been
used to generate the dataset of our AAAI 2021 paper.
In order to run them, you need to have ShapeNetSem
dataset installed on your machine.
Follow the steps below to download and prepare the ShapeNetSem dataset:
- Register and get access to ShapeNetSem.
- Only the OBJ and texture files are needed. Download
models-OBJ.zip
andmodels-textures.zip
. - Download
metadata.csv
andcategories.synset.csv
. - Unzip the compressed files and move the contents of
models-textures.zip
tomodels-OBJ/models
:
.
└── ShapeNetSem
├── categories.synset.csv
├── metadata.csv
└── models-OBJ
└── models
To run the data collection scripts use:
python experiments/exp_{ID}_{task}.py --logdir {path_to_logdir} --dataset_dir {path_to_ShapeNetSem} --category "WineBottle, Camera" --show_image
To see all available object classes that are suitable for these experiments see tact_sim/config.py.
Once you have collected the dataset, you can start training the multimodal ''resting state predictor'' dynamics model, as described in the paper, using:
python main.py --dataset-path {absolute_path_dataset} --problem-type seq_modeling --input-type visuotactile --model-name cnn-mvae --use-pose
This trains the MVAE model that fuses visual, tactile and pose modalilities into a shared latent space.
To train the resting state predictor for a single modality (e.g., tactile or visual only), use:
python main.py --dataset-path {absolute_path_dataset} --problem-type seq_modeling --input-type visual --model-name cnn-vae
To train a standard one-step dynamics model, use dyn_modeling
as the input argument problem_type
.
Check this video for a demo of the experiments:
This work by SECA is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
If you use this code in your research, please cite:
@article{rezaei2021learning,
title={Learning Intuitive Physics with Multimodal Generative Models},
author={Rezaei-Shoshtari, Sahand and Hogan, Francois Robert and Jenkin, Michael and Meger, David and Dudek, Gregory},
journal={arXiv preprint arXiv:2101.04454},
year={2021}
}
@inproceedings{hogan2021seeing,
title={Seeing Through your Skin: Recognizing Objects with a Novel Visuotactile Sensor},
author={Hogan, Francois R and Jenkin, Michael and Rezaei-Shoshtari, Sahand and Girdhar, Yogesh and Meger, David and Dudek, Gregory},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1218--1227},
year={2021}
}