Cross-Attention Based Multi-Resolution Feature Fusion Model for Self-Supervised Cervical OCT Image Classification

Qingbin Wang,Kaiyi Chen,Wanrong Dou,Yutao Ma

IEEE/ACM transactions on computational biology and bioinformatics（2023）

Cited 1|Views12

No score

Abstract

Cervical cancer seriously endangers the health of the female reproductive system and even risks women's life in severe cases. Optical coherence tomography (OCT) is a non-invasive, real-time, high-resolution imaging technology for cervical tissues. However, since the interpretation of cervical OCT images is a knowledge-intensive, time-consuming task, it is tough to acquire a large number of high-quality labeled images quickly, which is a big challenge for supervised learning. In this study, we introduce the vision Transformer (ViT) architecture, which has recently achieved impressive results in natural image analysis, into the classification task of cervical OCT images. Our work aims to develop a computer-aided diagnosis (CADx) approach based on a self-supervised ViT-based model to classify cervical OCT images effectively. We leverage masked autoencoders (MAE) to perform self-supervised pre-training on cervical OCT images, so the proposed classification model has a better transfer learning ability. In the fine-tuning process, the ViT-based classification model extracts multi-scale features from OCT images of different resolutions and fuses them with the cross-attention module. The ten-fold cross-validation results on an OCT image dataset from a multi-center clinical study of 733 patients in China indicate that our model achieved an AUC value of 0.9963 +/- 0.0069 with a 95.89 +/- 3.30% sensitivity and 98.23 +/- 1.36 % specificity, outperforming some state-of-the-art classification models based on Transformers and convolutional neural networks (CNNs) in the binary classification task of detecting high-risk cervical diseases, including high-grade squamous intraepithelial lesion (HSIL) and cervical cancer. Furthermore, ourmodel with the cross-shaped voting strategy achieved a sensitivity of 92.06% and specificity of 95.56% on an external validation dataset containing 288 three-dimensional (3D) OCT volumes from118 Chinese patients in a different new hospital. This result met or exceeded the average of four medical expertswho have used OCT for over one year. In addition to promising classification performance, our model has a remarkable ability to detect and visualize local lesions using the attention map of the standard ViT model, providing good interpretability for gynecologists to locate and diagnose possible cervical diseases.

Translated text

Key words

Cervical cancer,cross-attention,optical coherence tomography,self-supervised learning,vision transformer

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined