simmediumpolicy-learningmetric · varies

Optimizing Neurorobot Policy under Limited Demonstration Data through Preference Regret

Description

Robot reinforcement learning from demonstrations (RLfD) assumes that expert data is abundant; this is usually unrealistic in the real world given data scarcity as well as high collection cost. Furthermore, imitation learning algorithms assume that the data is independently and identically distributed, which ultimately results in poorer performance as gradual errors emerge and compound within test-time trajectories. We address these issues by introducing the "master your own expertise" (MYOE) fra

Source

http://arxiv.org/abs/2604.03523v1