← Back to Benchmarks
simmediumroboticsmetric · varies

TAG: Target-Agnostic Guidance for Stable Object-Centric Inference in Vision-Language-Action Models

Description

Vision--Language--Action (VLA) policies have shown strong progress in mapping language instructions and visual observations to robotic actions, yet their reliability degrades in cluttered scenes with distractors. By analyzing failure cases, we find that many errors do not arise from infeasible motions, but from instance-level grounding failures: the policy often produces a plausible grasp trajectory that lands slightly off-target or even on the wrong object instance. To address this issue, we pr

Source

http://arxiv.org/abs/2603.24584v1