Is Context All You Need? Non-contextual vs Contextual Multiword Expressions Detection

COMPUTATIONAL SCIENCE - ICCS 2022, PT I(2022)

引用 0|浏览3
暂无评分
摘要
Effective methods of the detection of multiword expressions are important for many technologies related to Natural Language Processing. Most contemporary methods are based on the sequence labeling scheme, while traditional methods use statistical measures. In our approach, we want to integrate the concepts of those two approaches. In this paper, we present a novel weakly supervised multiword expressions extraction method which focuses on their behaviour in various contexts. Our method uses a lexicon of Polish multiword units as the reference knowledge base and leverages neural language modelling with deep learning architectures. In our approach, we do not need a corpus annotated specifically for the task. The only required components are: a lexicon of multiword units, a large corpus, and a general contextual embeddings model. Compared to the method based on non-contextual embeddings, we obtain gains of 15% points of the macro Fl-score for both classes and 30% points of the Fl-score for the incorrect multiword expressions. The proposed method can be quite easily applied to other languages.
更多
查看译文
关键词
Natural Language Processing, Multiword expressions, Detection of multiword expressions, Contextual embeddings
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要