← Back to Benchmarks
simmediummanipulationmetric · varies
Grounding Robot Generalization in Training Data via Retrieval-Augmented VLMs
Description
Recent work on robot manipulation has advanced policy generalization to novel scenarios. However, it is often difficult to characterize how different evaluation settings actually represent generalization from the training distribution of a given policy. To work towards more precise evaluation of generalization in robotics, we propose RADAR, a scalable framework for directly comparing test-time evaluation tasks to policy training data, to determine what form of policy generalization is required.