Encoding laparoscopic image to words using vision transformer for distortion classification and ranking in laparoscopic videos

Multimedia Tools and Applications(2024)

Cited 0|Views4
No score
Abstract
Laparoscopic videos are tools used by surgeons to insert narrow tubes into the abdomen and keep the skin without large incisions. The videos captured by a camera are prone to numerous distortions such as uneven illumination, motion blur, defocus blur, smoke, and noise which have impact on visual quality. Automatic detection and identification of distortions are significant to enhance the quality of laparoscopic videos to avoid errors during surgery. The video quality assessment includes two stages: classification of distortions affecting the video frames to identify their types and ranking of distortions to estimate the intensity levels. The dataset generated in ICIP2020 challenge including laparoscopic videos was utilized for training, validation, and testing the proposed solution. The difficulty of this dataset is caused by having five categories of distortions and four levels of severity. Additionally, the availability of multiple distortion categories in one video is considered the most challenging part of this dataset. The work presented in this paper contributes to solve the multi-label distortion classification and ranking problem. This paper aims to enhance the performance of distortion classification solutions. Vision transformer which is a deep learning model was used to extract informative features by transferring learning and representation from the general domain to the medical domain (laparoscopic videos). Additionally, six parallel multilayer perceptron (MLP) classifiers were added and attached to vision transformer for distortion classification and ranking. The experiment showed that the proposed solution outperforms existing distortion classification methods in terms of average accuracy (89.7
More
Translated text
Key words
Distortion classification,Distortion ranking,Laparoscopic video,Multi-label classification,Transfer learning,Vision transformer,Video quality assessment
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined