← Back to Benchmarks
simmediumgraspingmetric · varies

DexVLG: Dexterous Vision-Language-Grasp Model at Scale

Description

As large models gain traction, vision-language-action (VLA) systems are enabling robots to tackle increasingly complex tasks. However, limited by the difficulty of data collection, progress has mainly focused on controlling simple gripper end-effectors. There is little research on functional grasping with large models for human-like dexterous hands. In this paper, we introduce DexVLG, a large Vision-Language-Grasp model for Dexterous grasp pose prediction aligned with language instructions using

Source

http://arxiv.org/abs/2507.02747v1