HORUS-NER: A Multimodal Named Entity Recognition Framework for Noisy Data

ADVANCES IN INTELLIGENT DATA ANALYSIS XIX, IDA 2021(2021)

引用 0|浏览24
暂无评分
摘要
Recent work based onDeep Learning presents state-of-the-art (SOTA) performance in the named entity recognition (NER) task. However, such models still have the performance drastically reduced in noisy data (e.g., social media, search engines), when compared to the formal domain (e.g., newswire). Thus, designing and exploring new methods and architectures is highly necessary to overcome current challenges. In this paper, we shift the focus of existing solutions to an entirely different perspective. We investigate the potential of embedding word-level features extracted from images and news. We performed a very comprehensive study in order to validate the hypothesis that images and news (obtained from an external source) may boost the task on noisy data, revealing very interesting findings. When our proposed architecture is used: (1) We beat SOTA in precision with simple CRFs models (2) The overall performance of decision trees-based models can be drastically improved. (3) Our approach overcomes off-the-shelf models for this task. (4) Images and text consistently increased recall over different datasets for SOTA, but at cost of precision. All experiment configurations, data and models are publicly available to the research community at horus-ner.org
更多
查看译文
关键词
Named entity recognition, WNUT, Noisy text, Information retrieval, Images, Text, Multi-modal
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要