← Back to Benchmarks
simmediumlocomotionmetric · varies
DoublyAware: Dual Planning and Policy Awareness for Temporal Difference Learning in Humanoid Locomotion
Description
Achieving robust robot learning for humanoid locomotion is a fundamental challenge in model-based reinforcement learning (MBRL), where environmental stochasticity and randomness can hinder efficient exploration and learning stability. The environmental, so-called aleatoric, uncertainty can be amplified in high-dimensional action spaces with complex contact dynamics, and further entangled with epistemic uncertainty in the models during learning phases. In this work, we propose DoublyAware, an unc