← Back to Benchmarks
simmediumsim-to-realmetric · varies

Now You See That: Learning End-to-End Humanoid Locomotion from Raw Pixels

Description

Achieving robust vision-based humanoid locomotion remains challenging due to two fundamental issues: the sim-to-real gap introduces significant perception noise that degrades performance on fine-grained tasks, and training a unified policy across diverse terrains is hindered by conflicting learning objectives. To address these challenges, we present an end-to-end framework for vision-driven humanoid locomotion. For robust sim-to-real transfer, we develop a high-fidelity depth sensor simulation t

Source

http://arxiv.org/abs/2602.06382v1