← Back to Benchmarks
simmediumpolicy-learningmetric · varies
Optimizing Neurorobot Policy under Limited Demonstration Data through Preference Regret
Description
Robot reinforcement learning from demonstrations (RLfD) assumes that expert data is abundant; this is usually unrealistic in the real world given data scarcity as well as high collection cost. Furthermore, imitation learning algorithms assume that the data is independently and identically distributed, which ultimately results in poorer performance as gradual errors emerge and compound within test-time trajectories. We address these issues by introducing the "master your own expertise" (MYOE) fra