Keep Your Eye on the Best: Contrastive Regression Transformer for Skill Assessment in Robotic Surgery

IEEE Robotics and Automation Letters(2023)

引用 3|浏览35
暂无评分
摘要
This letter proposes a novel video-based, contrastive regression architecture, Contra-Sformer, for automated surgical skill assessment in robot-assisted surgery. The proposed framework is structured to capture the differences in the surgical performance, between a test video and a reference video which represents optimal surgical execution. A feature extractor combining a spatial component (ResNet-18), supervised on frame-level with gesture labels, and a temporal component (TCN), generates spatio-temporal feature matrices of the test and reference videos. These are then fed into an action-aware Transformer with multi-head attention that produces inter-video contrastive features at frame level, representative of the skill similarity/deviation between the two videos. Moments of sub-optimal performance can be identified and temporally localized in the obtained feature vectors, which are ultimately used to regress the manually assigned skill scores. Validated on the JIGSAWS dataset, Contra-Sformer achieves competitive performance (Spearman 0.65-0.89), with a normalized mean absolute error between 5.8%-13.4% on all tasks and across validation setups.
更多
查看译文
关键词
Feature extraction,Task analysis,Surgery,Kinematics,Training,Needles,Transformers,Computer vision for medical robotics,deep learning methods,surgical skill assessment,contrastive regression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要