Skip to content

GLAMOR-USC/teach_tatc

Repository files navigation

TEACh Two-Agent Task Completion (TATC) Challenge

Task-driven Embodied Agents that Chat

Aishwarya Padmakumar*, Jesse Thomason*, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur

TEACH (Task-drive Embodied Agents that Chat) is a set of episodes in which a human Commander who knows what task needs to be accomplished and where objects in a scene are located works with a human Follower who controls a virtual agent in the scene to accomplish household chores. The Commander and Follower communicate via natural language text. TEACh enables researchers to study strategies for natural language cooperation, instruction following from egocentric vision, question and answer generation, and partner modeling. We believe these skills will be necessary for real-world language-driven robotics applications.

The code and model weights are licensed under the MIT License (see SOFTWARELICENSE), images are licensed under Apache 2.0 (see IMAGESLICENSE) and other data files are licensed under CDLA-Sharing 1.0 (see DATALICENSE). Please include appropriate licensing and attribution when using our data and code, and please cite our paper.

Note: The TEACh repository at https://github.com/alexa/teach is centered around the EDH and TFD challenges in which the agent must learn to produce actions from only dialogue history. This repository is adapted for the TATC challenge. In TATC, two agents Commander and Follower collaborate via natural language dialogue to solve the tasks.

Prerequisites

  • python3 >=3.7,<=3.8
  • python3.x-dev, example: sudo apt install python3.8-dev
  • tmux, example: sudo apt install tmux
  • xorg, example: sudo apt install xorg openbox
  • ffmpeg, example: sudo apt install ffmpeg

Installation

pip install -r requirements.txt
pip install -e .

Downloading the dataset

Run the following script:

sh download_data.sh

Download and extract the archive files (games.tar.gz, meta_data.tar.gz) in the default directory (/tmp/teach-dataset). To download the data used for the EDH challenge please refer to the README at https://github.com/alexa/teach.

Remote Server Setup

If running on a remote server without a display, the following setup will be needed to run episode replay, model inference of any model training that invokes the simulator (student forcing / RL).

Start an X-server

tmux
sudo python ./bin/startx.py

Exit the tmux session (CTRL+B, D). Any other commands should be run in the main terminal / different sessions.

Replaying episodes

Most users should not need to do this since we provide this output in meta_data.tar.gz.

The following steps can be used to read a .json file of a gameplay session, play it in the AI2-THOR simulator, and at each time step save egocentric observations of the Commander and Driver (Follower in the paper). It also saves the target object panel and mask seen by the Commander, and the difference between current and initial state.

Replaying a single episode locally, or in a new tmux session / main terminal of remote headless server:

teach_replay \
--game_fn /path/to/game/file \
--write_frames_dir /path/to/desired/output/images/dir \
--write_frames \
--write_states \
--status-out-fn /path/to/desired/output/status/file.json

Note that --status-out-fn must end in .json Also note that the script will by default not replay sessions for which an output subdirectory already exists under --write-frames-dir Additionally, if the file passed to --status-out-fn already exists, the script will try to resume files not marked as replayed in that file. It will error out if there is a mismatch between the status file and output directories on which sessions have been previously played. It is recommended to use a new --write-frames-dir and new --status-out-fn for additional runs that are not intended to resume from a previous one.

Replay all episodes in a folder locally, or in a new tmux session / main terminal of remote headless server:

teach_replay \
--game_dir /path/to/dir/containing/.game.json/files \
--write_frames_dir /path/to/desired/output/images/dir \
--write_frames \
--write_states \
--num_processes 50 \
--status-out-fn /path/to/desired/output/status/file.json

To generate a video, additionally specify --create_video. Note that for images to be saved, --write_images must be specified and --write-frames-dir must be provided. For state changes to be saved, --write_states must be specified and --write_frames_dir must be provided.

Training

We provide implementation for a baseline seq2seq attention model from the ALFRED repository adapted for the TATC benchmark. Note that we have removed files not used when running seq2seq on TEACh, and many files have been significantly modified.

Below are instructions for training and evaluating commander and driver seq2seq models. If running on a laptop, it might be desirable to mimic the folder structure of the TEACh dataset, but using only a small number of games from each split, and their corresponding images and EDH instances.

Set some useful environment variables. Optionally, you can copy these export statements over to a bash script and source it before training.

export TEACH_DATA=/tmp/teach-dataset
export TEACH_ROOT_DIR=/path/to/teach/repo
export TEACH_LOGS=/path/to/store/checkpoints
export VENV_DIR=/path/to/folder/to/store/venv
export TEACH_SRC_DIR=$TEACH_ROOT_DIR/src/teach
export INFERENCE_OUTPUT_PATH=/path/to/store/inference/execution/files
export MODEL_ROOT=$TEACH_SRC_DIR/modeling
export ET_ROOT=$TEACH_SRC_DIR/modeling/models/ET
export SEQ2SEQ_ROOT=$TEACH_SRC_DIR/modeling/models/seq2seq_attn
export PYTHONPATH="$TEACH_SRC_DIR:$MODEL_ROOT:$ET_ROOT:$SEQ2SEQ_ROOT:$PYTHONPATH"

Create a virtual environment

python3 -m venv $VENV_DIR/teach_env
source $VENV_DIR/teach_env/bin/activate
cd TEACH_ROOT_DIR
pip install --upgrade pip 
pip install -r requirements.txt

Download the ET pretrained checkpoint for Faster RCNN and Mask RCNN models

wget http://pascal.inrialpes.fr/data2/apashevi/et_checkpoints.zip
unzip et_checkpoints.zip
mv pretrained $TEACH_LOGS/
rm et_checkpoints.zip

Perform data preprocessing (this extracts image features and does some processing of game jsons). This step is optional as we already provide the preprocessed version of the dataset. However, we provide the command here for those who want to perform additional preprocessing.

python -m modeling.datasets.create_dataset \
    with args.visual_checkpoint=$TEACH_LOGS/pretrained/fasterrcnn_model.pth \
    args.data_input=games \
    args.task_type=game \
    args.data_output=tatc_dataset \
    args.vocab_path=None

Note: If running on laptop on a small subset of the data, use args.vocab_path=$MODEL_ROOT/vocab/human.vocab and add args.device=cpu.

Train commander and driver models (adjust the train.epochs value in this command to specify the number of desired train epochs). Also see modeling/exp_configs.py and modeling/models/seq2seq_attn/configs.py for additional training parameters. You can also run python -m modeling.train -h to list out the parameters and their usage.

agent=driver or commander
CUDA_VISIBLE_DEVICES=0 python -m modeling.train \
    with exp.model=seq2seq_attn \
    exp.name=seq2seq_attn_${agent} \
    exp.data.train=tatc_dataset \
    exp.agent=${agent} \
    seq2seq.epochs=20 \
    seq2seq.batch=8 \
    seq2seq.seed=2 \
    seq2seq.resume=False

Note: If running on laptop on a small subset of the data, add exp.device=cpu and exp.num_workers=1

Copy certain necessary files to the model folder so that we do not have to access training info at inference time.

cp $TEACH_DATA/tatc_dataset/data.vocab $TEACH_LOGS/seq2seq_attn_commander
cp $TEACH_DATA/tatc_dataset/params.json $TEACH_LOGS/seq2seq_attn_commander
cp $TEACH_DATA/tatc_dataset/data.vocab $TEACH_LOGS/seq2seq_attn_driver
cp $TEACH_DATA/tatc_dataset/params.json $TEACH_LOGS/seq2seq_attn_driver

Evaluate the trained model

cd $TEACH_ROOT_DIR
python src/teach/cli/inference.py \
    --model_module teach.inference.seq2seq_model \
    --model_class Seq2SeqModel \
    --data_dir $TEACH_DATA \
    --output_dir $INFERENCE_OUTPUT_PATH/inference__teach_tatc.json \
    --split valid_seen \
    --metrics_file $INFERENCE_OUTPUT_PATH/metrics__teach_tatc.json \
    --seed 0 \
    --commander_model_dir $TEACH_LOGS/seq2seq_attn_commander \
    --driver_model_dir $TEACH_LOGS/seq2seq_attn_driver \
    --object_predictor $TEACH_LOGS/pretrained/maskrcnn_model.pth \
    --device cpu

Note about model I/O

Driver


Driver actions:

  • Navigation, Interaction, Text

Driver input:

  • Driver current frame
  • Driver pose
  • Driver action history
  • Driver observation history
  • Dialogue history

Driver output:

  • Navigation or interaction action
  • (x, y) coordinates of object or object class selection

Commander


Commander actions:

  • OpenProgressCheck, SearchObject, SearchOid, Text

Commander inputs:

  • Commander current frame
  • Commander pose
  • Commander action history
  • Driver pose history
  • Driver observation history
  • Dialogue history
  • Action dependent
    • OpenProgressCheck: progress check output
    • SearchObject: target mask and target object frame
    • SearchOid: target mask and target object frame

Note: Commander does not get the task as input. It needs to first call OpenProgressCheck to see the task and communicate the task description to the Driver. Additionally, the Commander does not have direct access to the Driver's action history. It needs to infer the Driver's action success through the Driver's visual observation (e.g. if the frame doesn't change between timesteps).

Commander outputs:

  • Action / Text
  • Action dependent:
    • SearchObject: Object id
    • SearchOid: Object name

FAQ

Submission

Coming soon!

Security

See CONTRIBUTING for more information.

License

The code is licensed under the MIT License (see SOFTWARELICENSE), images are licensed under Apache 2.0 (see IMAGESLICENSE) and other data files are licensed under CDLA-Sharing 1.0 (see DATALICENSE).

About

No description, website, or topics provided.

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published