Surgical Triplet Recognition via Diffusion Model
arxiv(2024)
摘要
Surgical triplet recognition is an essential building block to enable
next-generation context-aware operating rooms. The goal is to identify the
combinations of instruments, verbs, and targets presented in surgical video
frames. In this paper, we propose DiffTriplet, a new generative framework for
surgical triplet recognition employing the diffusion model, which predicts
surgical triplets via iterative denoising. To handle the challenge of triplet
association, two unique designs are proposed in our diffusion framework, i.e.,
association learning and association guidance. During training, we optimize the
model in the joint space of triplets and individual components to capture the
dependencies among them. At inference, we integrate association constraints
into each update of the iterative denoising process, which refines the triplet
prediction using the information of individual components. Experiments on the
CholecT45 and CholecT50 datasets show the superiority of the proposed method in
achieving a new state-of-the-art performance for surgical triplet recognition.
Our codes will be released.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要