← Back to Benchmarks
simmediumvision-robotmetric · varies

LatentAM: Real-Time, Large-Scale Latent Gaussian Attention Mapping via Online Dictionary Learning

Description

We present LatentAM, an online 3D Gaussian Splatting (3DGS) mapping framework that builds scalable latent feature maps from streaming RGB-D observations for open-vocabulary robotic perception. Instead of distilling high-dimensional Vision-Language Model (VLM) embeddings using model-specific decoders, LatentAM proposes an online dictionary learning approach that is both model-agnostic and pretraining-free, enabling plug-and-play integration with different VLMs at test time. Specifically, our appr

Source

http://arxiv.org/abs/2602.12314v1