A Token-wise Graph-based Framework for Multimodal Named Entity Recognition.

ICME(2023)

引用 0|浏览20
暂无评分
摘要
Multimodal Named Entity Recognition (MNER) on social media posts is a leading but challenging task. However, most existing MNER methods fail to effectively exploit the visual information from the image. Besides, the multimodal interaction and alignment remains unsettled. In this paper, we propose a novel token-wise graph-based framework to deal with the MNER task. Specifically, a token-wise image processing manner is established. A muti-modal graph is constructed based on the textual token derived from BERT and the visual token derived from SwinT. Then, the muti-modal graph is fed into a multi-layer Transformer-based module for intra- and inter-modal information fusion. In addition, multiple contrastive learning is devised to perform the global and local alignment between textual and visual nodes. Experimental results on two benchmark multimodal datasets indicate that our model achieves state-of-the-art performance in MNER tasks.
更多
查看译文
关键词
Multimodal Named Entity Recognition, Information Extraction, Contrastive Learning, Social Media
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要