← Back to Benchmarks
simmediumgraspingmetric · varies
Graph-Fused Vision-Language-Action for Policy Reasoning in Multi-Arm Robotic Manipulation
Description
Acquiring dexterous robotic skills from human video demonstrations remains a significant challenge, largely due to conventional reliance on low-level trajectory replication, which often fails to generalize across varying objects, spatial layouts, and manipulator configurations. To address this limitation, we introduce Graph-Fused Vision-Language-Action (GF-VLA), a unified framework that enables dual-arm robotic systems to perform task-level reasoning and execution directly from RGB-D human demon