Improving Image Representations via MoCo Pre-training for Multimodal CXR Classification

MEDICAL IMAGE UNDERSTANDING AND ANALYSIS, MIUA 2022(2022)

引用 0|浏览31
暂无评分
摘要
Multimodal learning, here defined as learning from multiple input data types, has exciting potential for healthcare. However, current techniques rely on large multimodal datasets being available, which is rarely the case in the medical domain. In this work, we focus on improving the extracted image features which are fed into multimodal image-text Transformer architectures, evaluating on a medical multimodal classification task with dual inputs of chest X-ray images (CXRs) and the indication text passages in the corresponding radiology reports. We demonstrate that self-supervised Momentum Contrast (MoCo) pre-training of the image representation model on a large set of unlabelled CXR images improves multimodal performance compared to supervised ImageNet pre-training. MoCo shows a 0.6% absolute improvement in AUROC-macro, when considering the full MIMIC-CXR training set, and 5.1% improvement when limiting to 10% of the training data. To the best of our knowledge, this is the first demonstration of MoCo image pre-training for multimodal learning in medical imaging.
更多
查看译文
关键词
Multimodal learning,multimodal BERT,Image representation,Self-supervised image pre-training,CXR classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要