simmediummanipulationmetric · varies

Shallow-π: Knowledge Distillation for Flow-based VLAs

Description

The growing demand for real-time robotic deployment necessitates fast and on-device inference for vision-language-action (VLA) models. Within the VLA literature, efficiency has been extensively studied at the token level, such as visual token pruning. In contrast, systematic transformer layer reduction has received limited attention and, to the best of our knowledge, has not been explored for flow-based VLA models under knowledge distillation. In this work, we propose Shallow-pi, a principled kn

Source

http://arxiv.org/abs/2601.20262v1