A Multi-Scale Attention Framework for Automated Polyp Localization and Keyframe Extraction From Colonoscopy Videos

Vanshali Sharma,Pradipta Sasmal,M. K. Bhuyan,Pradip K. Das,Yuji Iwahori,Kunio Kasugai

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING（2023）

引用 1|浏览4

暂无评分

摘要

Colonoscopy video acquisition has been tremendously increased for retrospective analysis, comprehensive inspection, and detection of polyps to diagnose colorectal cancer (CRC). However, extracting meaningful clinical information from colonoscopy videos requires an enormous amount of reviewing time, which burdens the surgeons considerably. To reduce the manual efforts, we propose a first end-to-end automated multi-stage deep learning framework to extract an adequate number of clinically significant frames, i.e., keyframes from colonoscopy videos. The proposed framework comprises multiple stages that employ different deep learning models to select keyframes, which are high-quality, non-redundant polyp frames capturing multi-views of polyps. In one of the stages of our framework, we also propose a novel multi-scale attention-based model, YcOLOn, for polyp localization, which generates ROI and prediction scores crucial for obtaining keyframes. We further designed a GUI application to navigate through different stages. Extensive evaluation in real-world scenarios involving patient-wise and cross-dataset validations shows the efficacy of the proposed approach. The framework removes 96.3% and 94.02% frames, reduces detection processing time by 38.28% and 59.99%, and increases mAP by 2% and 5% on the SUN database and the CVC-VideoClinicDB, respectively. The source code is available at https://github.com/Vanshali/KeyframeExtraction Note to Practitioners-The widespread acceptance of colonoscopy procedures as a gold standard for CRC screening is constrained by the massive amount of data recorded during the process that needs to be manually reviewed. Such manual procedures are burdensome and induce human diagnostic errors. This article suggests an automated framework to extract keyframes (important frames) from colonoscopy videos that can efficiently represent the clinically relevant information captured in the video streams. This is achieved by the automated removal of uninformative and highly correlated frames, which do not add to clinical findings. The approach ensures diversity among keyframes and provides clinicians with a multi-view of polyps for easy resection. In addition, the proposed multi-scale attention-based model improves the polyp localization performance, which further helps in refining the keyframe selection process. The comprehensive experimental results corroborate that discarding insignificant frames can enhance polyp detection and localization performance and reduce computational requirements. The study estimates 30% to 60% time saving for clinicians during video screening. In clinical practices, the proposed automated framework and our designed GUI would enable surgeons to visualize the essential data better with minimal manual interventions and assist in precise polyp resection.

查看译文

关键词

Keyframe extraction,colonoscopy videos,polyp detection,polyp localization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要