Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by Diminishing Bias
arXiv (Cornell University)(2023)
摘要
The scarcity of data presents a critical obstacle to the efficacy of medical
visionlanguage pre-training (VLP). A potential solution lies in the combination
of datasets from various language communities. Nevertheless, the main challenge
stems from the complexity of integrating diverse syntax and semantics,
language-specific medical terminology, and culture-specific implicit knowledge.
Therefore, one crucial aspect to consider is the presence of community bias
caused by different languages. This paper presents a novel framework named
Unifying Cross-Lingual Medical Vision-Language Pre-Training (Med-UniC),
designed to integrate multimodal medical data from the two most prevalent
languages, English and Spanish. Specifically, we propose Cross-lingual Text
Alignment Regularization (CTR) to explicitly unify cross-lingual semantic
representations of medical reports originating from diverse language
communities. CTR is optimized through latent language disentanglement,
rendering our optimization objective to not depend on negative samples, thereby
significantly mitigating the bias from determining positive-negative sample
pairs within analogous medical reports. Furthermore, it ensures that the
cross-lingual representation is not biased toward any specific language
community. Med-UniC reaches superior performance across 5 medical image tasks
and 10 datasets encompassing over 30 diseases, offering a versatile framework
for unifying multi-modal medical data within diverse linguistic communities.
The experimental outcomes highlight the presence of community bias in
cross-lingual VLP. Reducing this bias enhances the performance not only in
vision-language tasks but also in uni-modal visual tasks.
更多查看译文
关键词
medical,med-unic,cross-lingual,vision-language,pre-training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要