← Back to Benchmarks
simmediummanipulationmetric · varies

ReViP: Mitigating False Completion in Vision-Language-Action Models with Vision-Proprioception Rebalance

Description

Vision-Language-Action (VLA) models have advanced robotic manipulation by combining vision, language, and proprioception to predict actions. However, previous methods fuse proprioceptive signals directly with vision-language features, resulting in state-dominant bias and \textbf{false completions} despite visible execution failures. We systematically analyze this failure mode, attributing it to modality imbalance, where policies overly rely on internal state progression and underuse visual evide

Source

http://arxiv.org/abs/2601.16667v3