Embodied Red-Teaming (ERT) is an automated red-teaming evaluation method for benchmarking language-conditioned robot models, including robotic foundation models. See the paper here: https://arxiv.org/pdf/2411.18676.
This repository will be updated soon to reproduce the CALVIN, RLBench, and OpenVLA experiments. For now, you can use generate_instructions.py
to generate instructions using ERT.