← Back to Benchmarks
simmediumoffline-rlmetric · varies

SpatialTraceGen: High-Fidelity Traces for Efficient VLM Spatial Reasoning Distillation

Description

While Vision-Language Models (VLMs) excel in many areas, they struggle with complex spatial reasoning, which requires problem decomposition and strategic tool use. Fine-tuning smaller, more deployable models offers an efficient path to strong performance, but this is hampered by a major bottleneck: the absence of high-quality, step-by-step reasoning data. To address this data-efficiency gap, we introduce SpatialTraceGen, a framework to distill the reasoning processes of a large teacher model int

Source

http://arxiv.org/abs/2511.00054v1