← Back to Benchmarks
simmediumnavigationmetric · varies
TimeSpot: Benchmarking Geo-Temporal Understanding in Vision-Language Models in Real-World Settings
Description
Geo-temporal understanding, the ability to infer location, time, and contextual properties from visual input alone, underpins applications such as disaster management, traffic planning, embodied navigation, world modeling, and geography education. Although recent vision-language models (VLMs) have advanced image geo-localization using cues like landmarks and road signs, their ability to reason about temporal signals and physically grounded spatial cues remains limited. To address this gap, we in