← Back to Benchmarks
simmediummanipulationmetric · varies

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

Description

Manipulating dynamic objects remains an open challenge for Vision-Language-Action (VLA) models, which, despite strong generalization in static manipulation, struggle in dynamic scenarios requiring rapid perception, temporal anticipation, and continuous control. We present DynamicVLA, a framework for dynamic object manipulation that integrates temporal reasoning and closed-loop adaptation through three key designs: 1) a compact 0.4B VLA using a convolutional vision encoder for spatially efficient

Source

http://arxiv.org/abs/2601.22153v1