← Back to Benchmarks
simmediummanipulationmetric · varies
ReViP: Mitigating False Completion in Vision-Language-Action Models with Vision-Proprioception Rebalance
Description
Vision-Language-Action (VLA) models have advanced robotic manipulation by combining vision, language, and proprioception to predict actions. However, previous methods fuse proprioceptive signals directly with vision-language features, resulting in state-dominant bias and \textbf{false completions} despite visible execution failures. We systematically analyze this failure mode, attributing it to modality imbalance, where policies overly rely on internal state progression and underuse visual evide