simmediumrlmetric · varies

MOON3.0: Reasoning-aware Multimodal Representation Learning for E-commerce Product Understanding

Description

With the rapid growth of e-commerce, exploring general representations rather than task-specific ones has attracted increasing attention. Although recent multimodal large language models (MLLMs) have driven significant progress in product understanding, they are typically employed as feature extractors that implicitly encode product information into global embeddings, thereby limiting their ability to capture fine-grained attributes. Therefore, we argue that leveraging the reasoning capabilities

Source

http://arxiv.org/abs/2604.00513v2