simmediumvision-robotmetric · varies

UniGround: Universal 3D Visual Grounding via Training-Free Scene Parsing

Description

Understanding and localizing objects in complex 3D environments from natural language descriptions, known as 3D Visual Grounding (3DVG), is a foundational challenge in embodied AI, with broad implications for robotics, augmented reality, and human-machine interaction. Large-scale pre-trained foundation models have driven significant progress on this front, enabling open-vocabulary 3DVG that allows systems to locate arbitrary objects in a given scene. However, their reliance on pre-trained models

Source

http://arxiv.org/abs/2603.08131v1