simmediumlocomotionmetric · varies

Scalable Multi-Objective Robot Reinforcement Learning through Gradient Conflict Resolution

Description

Reinforcement Learning (RL) robot controllers usually aggregate many task objectives into one scalar reward. While large-scale proximal policy optimisation (PPO) has enabled impressive results such as robust robot locomotion in the real world, many tasks still require careful reward tuning and are brittle to local optima. Tuning cost and sub-optimality grow with the number of objectives, limiting scalability. Modelling reward vectors and their trade-offs can address these issues; however, multi-

Source

http://arxiv.org/abs/2509.14816v1