← Back to Benchmarks
simmediummanipulationmetric · varies

AIR-VLA: Vision-Language-Action Systems for Aerial Manipulation

Description

While Vision-Language-Action (VLA) models have achieved remarkable success in ground-based embodied intelligence, their application to Aerial Manipulation Systems (AMS) remains a largely unexplored frontier. The inherent characteristics of AMS, including floating-base dynamics, strong coupling between the UAV and the manipulator, and the multi-step, long-horizon nature of operational tasks, pose severe challenges to existing VLA paradigms designed for static or 2D mobile bases. To bridge this ga

Source

http://arxiv.org/abs/2601.21602v2