← Back to Benchmarks
simmediummanipulation-datametric · varies

Show, Don't Tell: Detecting Novel Objects by Watching Human Videos

Description

How can a robot quickly identify and recognize new objects shown to it during a human demonstration? Existing closed-set object detectors frequently fail at this because the objects are out-of-distribution. While open-set detectors (e.g., VLMs) sometimes succeed, they often require expensive and tedious human-in-the-loop prompt engineering to uniquely recognize novel object instances. In this paper, we present a self-supervised system that eliminates the need for tedious language descriptions an

Source

http://arxiv.org/abs/2603.12751v1