← Back to Benchmarks
simmediummanipulationmetric · varies

RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation

Description

Advances in large vision-language models (VLMs) have stimulated growing interest in vision-language-action (VLA) systems for robot manipulation. However, existing manipulation datasets remain costly to curate, highly embodiment-specific, and insufficient in coverage and diversity, thereby hindering the generalization of VLA models. Recent approaches attempt to mitigate these limitations via a plan-then-execute paradigm, where high-level plans (e.g., subtasks, trace) are first generated and subse

Source

http://arxiv.org/abs/2602.09973v1