← Back to Benchmarks
simmediummanipulation-datametric · varies

See Once, Then Act: Vision-Language-Action Model with Task Learning from One-Shot Video Demonstrations

Description

Developing robust and general-purpose manipulation policies represents a fundamental objective in robotics research. While Vision-Language-Action (VLA) models have demonstrated promising capabilities for end-to-end robot control, existing approaches still exhibit limited generalization to tasks beyond their training distributions. In contrast, humans possess remarkable proficiency in acquiring novel skills by simply observing others performing them once. Inspired by this capability, we propose V

Source

http://arxiv.org/abs/2512.07582v1