simmediumgraspingmetric · varies

Choose What to Observe: Task-Aware Semantic-Geometric Representations for Visuomotor Policy

Description

Visuomotor policies learned from demonstrations often overfit to nuisance visual factors in raw RGB observations, resulting in brittle behavior under appearance shifts such as background changes and object recoloring. We propose a task-aware observation interface that canonicalizes visual input into a shared representation, improving robustness to out-of-distribution (OOD) appearance changes without modifying or fine-tuning the policy. Given an RGB image and an open-vocabulary specification of t

Source

http://arxiv.org/abs/2603.07875v1