Keyword Extraction from Biomedical Documents Using Deep Contextualized Embeddings.

INISTA(2021)

引用 2|浏览4
暂无评分
摘要
Due to the rapidly increasing amount of biomedical publications, it has become challenging to follow scientific articles and new developments. Keywords in scientific articles provide a quick understanding and summarize the important points of the context. When keywords are not used in some biomedical articles or are not sufficient to express the content of the text, automatic keyword extraction systems are needed. This paper addresses the keyword extraction problem as a sequence labeling task where words are represented as deep contextual embeddings. We predict the keyword tags identified in sequence labeling by fine-tuning XLNET and BERT-based models such as BERT, BioBERT, SCIBERT, and RoBERTa. Our proposed method does not need extra dictionaries required by rule-based methods and feature extraction as in traditional machine learning methods. Performance evaluation on the benchmark dataset for biomedical keyword extraction shows that domain-specific contextualized embeddings (BioBERT, SciBERT) achieve state-of-the-art results compared to the general domain embeddings (BERT, RoBERTa, XLNET) and unsupervised methods.
更多
查看译文
关键词
keyword extraction,deep contextualized embeddings,sequence labeling,medical informatics,natural language processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要