← Back to Benchmarks
simmediummanipulationmetric · varies

Trustworthy Evaluation of Robotic Manipulation: A New Benchmark and AutoEval Methods

Description

Driven by the rapid evolution of Vision-Action and Vision-Language-Action models, imitation learning has significantly advanced robotic manipulation capabilities. However, evaluation methodologies have lagged behind, hindering the establishment of Trustworthy Evaluation for these behaviors. Current paradigms rely on binary success rates, failing to address the critical dimensions of trust: Source Authenticity (i.e., distinguishing genuine policy behaviors from human teleoperation) and Execution

Source

http://arxiv.org/abs/2601.18723v1