← Back to Benchmarks
simmediummobile-manipulationmetric · varies
NanoVLA: Routing Decoupled Vision-Language Understanding for Nano-sized Generalist Robotic Policies
Description
Vision-language-action (VLA) models have significantly advanced robotic manipulation by integrating vision-language models (VLMs), and action decoders into a unified architecture. However, their deployment on resource-constrained edge devices, such as mobile robots or embedded systems (e.g., Jetson Orin Nano), remains challenging due to high computational demands, especially in real-world scenarios where power, latency, and computational resources are critical. To close this gap, we introduce Na