← Back to Benchmarks
simmediumlocomotionmetric · varies
Risk-Aware Reinforcement Learning with Bandit-Based Adaptation for Quadrupedal Locomotion
Description
In this work, we study risk-aware reinforcement learning for quadrupedal locomotion. Our approach trains a family of risk-conditioned policies using a Conditional Value-at-Risk (CVaR) constrained policy optimization technique that provides improved stability and sample efficiency. At deployment, we adaptively select the best performing policy from the family of policies using a multi-armed bandit framework that uses only observed episodic returns, without any privileged environment information,