← Back to Benchmarks
simmediummanipulation-datametric · varies

E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes

Description

Robotic Vision-Language-Action (VLA) models generalize well for open-ended manipulation, but their perception is fragile under sensing-stage degradations such as extreme low light, motion blur, and black clipping. We present E-VLA, an event-augmented VLA framework that improves manipulation robustness when conventional frame-based vision becomes unreliable. Instead of reconstructing images from events, E-VLA directly leverages motion and structural cues in event streams to preserve semantic perc

Source

http://arxiv.org/abs/2604.04834v1