simmediumquadrupedmetric · varies

Preventing Robotic Jailbreaking via Multimodal Domain Adaptation

Description

Large Language Models (LLMs) and Vision-Language Models (VLMs) are increasingly deployed in robotic environments but remain vulnerable to jailbreaking attacks that bypass safety mechanisms and drive unsafe or physically harmful behaviors in the real world. Data-driven defenses such as jailbreak classifiers show promise, yet they struggle to generalize in domains where specialized datasets are scarce, limiting their effectiveness in robotics and other safety-critical contexts. To address this gap

Source

http://arxiv.org/abs/2509.23281v1