simmediumroboticsmetric · varies

Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

Description

Reinforcement Learning (RL) has shown great potential in refining robotic manipulation policies, yet its efficacy remains strongly bottlenecked by the difficulty of designing generalizable reward functions. In this paper, we propose a framework for online policy refinement by adapting foundation VLMs into online reward generators. We develop a robust, scalable reward model based on a state-of-the-art VLM, trained on a large-scale, multi-source dataset encompassing real-world robot trajectories,

Source

http://arxiv.org/abs/2603.16065v2