← Back to Benchmarks
simmediumvision-robotmetric · varies
TrajPred: Trajectory-Conditioned Joint Embedding Prediction for Surgical Instrument-Tissue Interaction Recognition in Vision-Language Models
Description
Recognizing instruments' interactions with tissues is essential for building context-aware AI assistants in robotic surgery. Vision-language models (VLMs) have opened a new avenue for surgical perception and achieved better generalization on a wide range of tasks compared to conventional task-specific deep learning approaches. However, their performance on instrument--tissue interaction recognition remains limited, largely due to two challenges: (1) many models do not effectively leverage tempor