← Back to Benchmarks
simmediumvision-robotmetric · varies

TrajPred: Trajectory-Conditioned Joint Embedding Prediction for Surgical Instrument-Tissue Interaction Recognition in Vision-Language Models

Description

Recognizing instruments' interactions with tissues is essential for building context-aware AI assistants in robotic surgery. Vision-language models (VLMs) have opened a new avenue for surgical perception and achieved better generalization on a wide range of tasks compared to conventional task-specific deep learning approaches. However, their performance on instrument--tissue interaction recognition remains limited, largely due to two challenges: (1) many models do not effectively leverage tempor

Source

http://arxiv.org/abs/2603.06999v3