simmediumrlmetric · varies

Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning

Description

Vision-language models (VLMs) often struggle with geometric reasoning due to their limited perception of fundamental diagram elements. To tackle this challenge, we introduce GeoPerceive, a benchmark comprising diagram instances paired with domain-specific language (DSL) representations, along with an efficient automatic data generation pipeline. This design enables the isolated evaluation of geometric perception independently from reasoning. To exploit the data provided by GeoPerceive for enhanc

Source

http://arxiv.org/abs/2602.22703v1