← Back to Benchmarks
simmediummanipulationmetric · varies

SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning

Description

Vision-Language-Action (VLA) models exhibit strong generalization in robotic manipulation, yet reinforcement learning (RL) fine-tuning often degrades robustness under spatial distribution shifts. For flow-matching VLA policies, this degradation is closely associated with the erosion of spatial inductive bias during RL adaptation, as sparse rewards and spatially agnostic exploration increasingly favor short-horizon visual cues. To address this issue, we propose \textbf{SA-VLA}, a spatially-aware

Source

http://arxiv.org/abs/2602.00743v1