← Back to Benchmarks
simmediummanipulationmetric · varies
Improving Robustness of Vision-Language-Action Models by Restoring Corrupted Visual Inputs
Description
Vision-Language-Action (VLA) models have emerged as a dominant paradigm for generalist robotic manipulation, unifying perception and control within a single end-to-end architecture. However, despite their success in controlled environments, reliable real-world deployment is severely hindered by their fragility to visual disturbances. While existing literature extensively addresses physical occlusions caused by scene geometry, a critical mode remains largely unexplored: image corruptions. These s