Skip to content

This paper presents an advanced approach to social robot navigation using language-guided instructions to enhance the integration of robots into dynamic human environments. Leveraging the foundational NaviSTAR and the MuSoHu dataset, we utilized Visual-Language Models (VLMs) to interpret high-level textual commands and social cues.

License

Notifications You must be signed in to change notification settings

anandketan/VLM_Social_Navigation

Repository files navigation

Language-Guided Socially Aware Embodied Navigation

Achieving socially aware navigation remains an ongoing challenge in robotics. Social robot navigation is a leap toward seamlessly integrating embodied agents into dynamic human environments. Despite recent advancements, existing methodologies fail to adequately address a wide span of human social conventions while navigating dense environments. Building upon the foundation laid by NaviSTAR and the MuSoHu dataset, this paper introduces a novel approach to social robot navigation using language-guided methods.

Our methodology is driven by a multi-modal model, implementing Visual-Language Models (VLMs) to decode high-level text commands and subtle social cues from human interactions. Unlike conventional navigation systems, our approach embraces the complexity of human-environment interactions by leveraging a diverse array of sensors and datasets. In particular, we augment our training data with the SCAND dataset, enriching our model's understanding of socially compliant navigation behaviors.

Central to our methodology is the adept use of VLMs, which enables our system to decode intricate social cues and high-level textual commands, thereby elevating the robots' understanding of and responsiveness to their social surroundings. This is complemented by the system's adaptability to a wide range of user preferences and navigation styles, achieved through instruction tuning mechanisms. Such adaptability ensures that robotic actions are more closely aligned with human expectations, improving the quality of interaction. Our work lays a solid foundation for the future development of navigation systems that enable robots to navigate and interact fluently in the nuanced and ever-changing landscape of human social settings.

Citations:

[1] Wang, Weizheng, et al. ”NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning.” arXiv preprint arXiv:2304.05979 (2023).
[2] Nguyen, Duc M., et al. ”Toward Human-Like Social Robot Navigation: A Large-Scale, Multi-Modal, Social Human Navigation Dataset.” arXiv preprint arXiv:2303.14880 (2023).
[3] Karnan, Haresh, et al. ”Socially compliant navigation dataset (SCAND): A large-scale dataset of demonstra- tions for social navigation.” IEEE Robotics and Automation Letters 7.4 (2022): 11807-11814.

About

This paper presents an advanced approach to social robot navigation using language-guided instructions to enhance the integration of robots into dynamic human environments. Leveraging the foundational NaviSTAR and the MuSoHu dataset, we utilized Visual-Language Models (VLMs) to interpret high-level textual commands and social cues.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published