← Back to Benchmarks
simmediummanipulationmetric · varies
Rethinking the Practicality of Vision-language-action Model: A Comprehensive Benchmark and An Improved Baseline
Description
Vision-Language-Action (VLA) models have emerged as a generalist robotic agent. However, existing VLAs are hindered by excessive parameter scales, prohibitive pre-training requirements, and limited applicability to diverse embodiments. To improve the practicality of VLAs, we propose a comprehensive benchmark and an improved baseline. First, we propose CEBench, a new benchmark spanning diverse embodiments in both simulation and the real world with consideration of domain randomization. We collect