← Back to Benchmarks
simmediummanipulation-datametric · varies
Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation
Description
Visual loco-manipulation of arbitrary objects in the wild with humanoid robots requires accurate end-effector (EE) control and a generalizable understanding of the scene via visual inputs (e.g., RGB-D images). Existing approaches are based on real-world imitation learning and exhibit limited generalization due to the difficulty in collecting large-scale training datasets. This paper presents a new paradigm, HERO, for object loco-manipulation with humanoid robots that combines the strong generali