TUNeS: A Temporal U-Net with Self-Attention for Video-based Surgical Phase Recognition

CoRR(2023)

引用 0|浏览6
暂无评分
摘要
To enable context-aware computer assistance in the operating room of the future, cognitive systems need to understand automatically which surgical phase is being performed by the medical team. The primary source of information for surgical phase recognition is typically video, which presents two challenges: extracting meaningful features from the video stream and effectively modeling temporal information in the sequence of visual features. For temporal modeling, attention mechanisms have gained popularity due to their ability to capture long-range dependencies. In this paper, we explore design choices for attention in existing temporal models for surgical phase recognition and propose a novel approach that uses attention more effectively: TUNeS, an efficient and simple temporal model that incorporates self-attention at the core of a convolutional U-Net structure. In addition, we propose to train the feature extractor, a standard CNN, together with an LSTM on preferably long video segments, i.e., with long temporal context. In our experiments, all temporal models performed better on top of feature extractors that were trained with longer temporal context. On these contextualized features, TUNeS achieves state-of-the-art results on the Cholec80 dataset.
更多
查看译文
关键词
recognition,u-net,self-attention,video-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要