simmediumgraspingmetric · varies

PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies?

Description

Vision-Language Models (VLMs) are increasingly pivotal for generalist robot manipulation, enabling tasks such as physical reasoning, policy generation, and failure detection. However, their proficiency in these high-level applications often assumes a deep understanding of low-level physical prerequisites, a capability that remains largely unverified. For robots to perform actions reliably, they must comprehend intrinsic object properties (e.g., material, weight), action affordances (e.g., graspa

Source

http://arxiv.org/abs/2506.23725v1