simmediumgraspingmetric · varies

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

Description

The ability to use, understand, and create tools is a hallmark of human intelligence, enabling sophisticated interaction with the physical world. For any general-purpose intelligent agent to achieve true versatility, it must also master these fundamental skills. While modern Multimodal Large Language Models (MLLMs) leverage their extensive common knowledge for high-level planning in embodied AI and in downstream Vision-Language-Action (VLA) models, the extent of their true understanding of physi

Source

http://arxiv.org/abs/2510.09507v1