← Back to Benchmarks
simmediummobile-manipulationmetric · varies

NanoVLA: Routing Decoupled Vision-Language Understanding for Nano-sized Generalist Robotic Policies

Description

Vision-language-action (VLA) models have significantly advanced robotic manipulation by integrating vision-language models (VLMs), and action decoders into a unified architecture. However, their deployment on resource-constrained edge devices, such as mobile robots or embedded systems (e.g., Jetson Orin Nano), remains challenging due to high computational demands, especially in real-world scenarios where power, latency, and computational resources are critical. To close this gap, we introduce Na

Source

http://arxiv.org/abs/2510.25122v1