← Back to Benchmarks
simmediumnavigationmetric · varies
Enhancing Lightweight Vision Language Models through Group Competitive Learning for Socially Compliant Navigation
Description
Social robot navigation requires a sophisticated integration of scene semantics and human social norms. Scaling up Vision Language Models (VLMs) generally improves reasoning and decision-making capabilities for socially compliant navigation. However, increased model size incurs substantial computational overhead, limiting suitability for real-time robotic deployment. Conversely, lightweight VLMs enable efficient inference but often exhibit weaker reasoning and decision-making performance in soci