← Back to Benchmarks
simmediummanipulation-datametric · varies

VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer

Description

Vision-Language-Action (VLA) models have demonstrated remarkable capabilities in generalizing across diverse robotic manipulation tasks. However, deploying these models in unstructured environments remains challenging due to the critical need for simultaneous task compliance and safety assurance, particularly in preventing potential collisions during physical interactions. In this work, we introduce a Vision-Language-Safe Action (VLSA) architecture, named AEGIS, which contains a plug-and-play sa

Source

http://arxiv.org/abs/2512.11891v1