← Back to Benchmarks
simmediummanipulationmetric · varies

StreamVLA: Breaking the Reason-Act Cycle via Completion-State Gating

Description

Long-horizon robotic manipulation requires bridging the gap between high-level planning (System 2) and low-level control (System 1). Current Vision-Language-Action (VLA) models often entangle these processes, performing redundant multimodal reasoning at every timestep, which leads to high latency and goal instability. To address this, we present StreamVLA, a dual-system architecture that unifies textual task decomposition, visual goal imagination, and continuous action generation within a single

Source

http://arxiv.org/abs/2602.01100v2