Official implementation of CVPR 2022 paper "Finding Fallen Objects Via Asynchronous Audio-Visual Integration".
Download the dataset from here, and extract it in the project root.
The dataset
sub-directory contains the necessary information of a case to be loaded into our environment.
The .wav
files within it are the recorded audio of object falling in each case.
The perception
sub-directory contains some information helpful for utilizing our environment. Each .json
file contains several fields for the case.
-
position
($x, y, z$ ) stands for the position of the fallen object relative to the initial state of the agent. The$y$ -axis represents the vertical direction.$(0, 0, 1)$ is the facing direction of the agent. -
name
the name of the fallen object. The same name represents the exact same object model. -
category
the category of the fallen object. Each category may have multiple different object models.
The environment is based on TDW. We tested it on version 1.8.29, which you can download TDW_Linux.tar.gz from here.
You should follow this to install NVIDIA and X on your linux server.
If you need to run this environment in docker, you need also install nvidia-docker
following this.
After downloading TDW_Linux.tar.gz, extract it into the docker
directory. The executable TDW should be located at docker/TDW/TDW.x86_64
.
tdw environment setup:
conda create -n tdw
conda activate tdw
pip install gym pyastar magnebot==1.3.2 tdw==1.8.29
planner environment setup:
conda create -n planner
conda activate planner
pip install librosa scikit-image pystar2d docker-compose tdw
pip install 'git+https://github.com/facebookresearch/detectron2.git'
cd env/openai_baselines
pip install -e .
You can then launch the environment via
conda activate tdw
python interface.py --display=<display> --split=<split> --port=<port>
You can use the docker/test.py
script to validate the installation for either case. Use port 2590
when launching, or you should edit it in the test script.
The environment will output some information in env_log/
after each case.
obs
contains following entries:
-
rgb
,depth
: the RGB or depth image captured by the agent in the current frame -
camera_matrix
: the camera matrix of the captured RGB and depth image -
agent
($x, y, z, fx, fy, fz$ ):$(x, y, z)$ denotes the current location of the agent,$(fx, fy, fz)$ denotes the current facing direction of the agent -
FOV
: field-of-view -
audio
: the audio recorded when the object falls down. It's a byte array by padding1
s to the right of the bytes of.wav
file.
info
contains following entries:
scene_info
: adict
representing the name of the casestatus
: (of typemagnebot.ActionStatus
) the result of the last object, e.g. success or collidefinish
: whether the task has succeeded
Use the following numbers for action
0
: move forward1
: turn left2
: turn right3
: move camera up4
: move camera down5
: claim that the target is in view within the threshold distance
If you want to run multiple environments in parralel, e.g. for training, we borrow the code from openai/baselines (slightly modified) so that you can run:
from env.envs import make_vec_envs
envs = make_vec_envs('find_fallen-v0', num_processes, log_dir, device, True, spaces=(observation_space, action_space), port=<port>, displays=<displays>, split='train')
obs, info = envs.reset()
obs, reward, done, info = envs.step([5 for _ in range(num_processes)])
Notes: In this case, if a case is done
, the obs
and info
returned by step
will be the initial status of the next case.
It will use port numbers [port
, port + num_processes
), and use X displays in displays
(it should be a list of strings such as [":4", ":5"]
).
A single X display can be used for multiple instances simultaneously, so the length of displays
can be smaller than num_processes
.
We provide the code of our modular planner in baseline/planner
.
Run it with (replace :4 :5
with your available X displays).
You can download the pretrained modular models here and place them in <project root>/pretrained
.
conda activate planner
python baseline/planner/main_planner.py --displays :4 :5
You can evaluate the result (SR, SPL, SNA) by putting script into the env_log folder and run
python eval.py
you can replace "non_distractor" with "distractor"