simmediummanipulation-datametric · varies

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

Description

Standard practice across domains from robotics to language is to first pretrain a policy on a large-scale demonstration dataset, and then finetune this policy, typically with reinforcement learning (RL), in order to improve performance on deployment domains. This finetuning step has proved critical in achieving human or super-human performance, yet while much attention has been given to developing more effective finetuning algorithms, little attention has been given to ensuring the pretrained po

Source

http://arxiv.org/abs/2512.16911v1