simmediumroboticsmetric · varies

GIFT: Generalizing Intent for Flexible Test-Time Rewards

Description

Robots learn reward functions from user demonstrations, but these rewards often fail to generalize to new environments. This failure occurs because learned rewards latch onto spurious correlations in training data rather than the underlying human intent that demonstrations represent. Existing methods leverage visual or semantic similarity to improve robustness, yet these surface-level cues often diverge from what humans actually care about. We present Generalizing Intent for Flexible Test-Time R

Source

http://arxiv.org/abs/2603.22574v1