simmediumhumanoidmetric · varies

Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control

Description

Reinforcement learning (RL) is widely used for humanoid control, with on-policy methods such as Proximal Policy Optimization (PPO) enabling robust training via large-scale parallel simulation and, in some cases, zero-shot deployment to real robots. However, the low sample efficiency of on-policy algorithms limits safe adaptation to new environments. Although off-policy RL and model-based RL have shown improved sample efficiency, the gap between large-scale pretraining and efficient finetuning on

Source

http://arxiv.org/abs/2601.21363v3