← Back to Benchmarks
simmediumhumanoidmetric · varies
HumanoidVLM: Vision-Language-Guided Impedance Control for Contact-Rich Humanoid Manipulation
Description
Humanoid robots must adapt their contact behavior to diverse objects and tasks, yet most controllers rely on fixed, hand-tuned impedance gains and gripper settings. This paper introduces HumanoidVLM, a vision-language driven retrieval framework that enables the Unitree G1 humanoid to select task-appropriate Cartesian impedance parameters and gripper configurations directly from an egocentric RGB image. The system couples a vision-language model for semantic task inference with a FAISS-based Retr