Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study

Dinithi Vithanage,Ping Yu,Lei Wang,Chao Deng

Journal of Healthcare Informatics Research（2024）

引用 0|浏览0

暂无评分

摘要

Recent advancements in natural language processing (NLP), particularly contextual word embedding models, have improved knowledge extraction from biomedical and healthcare texts. However, limited comprehensive research compares these models. This study conducts a scoping review and compares the performance of the major contextual word embedding models for biomedical knowledge extraction. From 26 articles identified from Scopus, PubMed, PubMed Central, and Google Scholar between 2017 and 2021, 18 notable contextual word embedding models were identified. These include ELMo, BERT, BioBERT, BlueBERT, CancerBERT, DDS-BERT, RuBERT, LABSE, EhrBERT, MedBERT, Clinical BERT, Clinical BioBERT, Discharge Summary BERT, Discharge Summary BioBERT, GPT, GPT-2, GPT-3, and GPT2-Bio-Pt. A case study compared the performance of six representative models—ELMo, BERT, BioBERT, BlueBERT, Clinical BioBERT, and GPT-3—across text classification, named entity recognition, and question answering. The evaluation utilized datasets comprising biomedical text from tweets, NCBI, PubMed, and clinical notes sourced from two electronic health record datasets. Performance metrics, including accuracy and F1 score, were used. The results of this case study reveal that BioBERT performs the best in analyzing biomedical text, while Clinical BioBERT excels in analyzing clinical notes. These findings offer crucial insights into word embedding models for researchers, practitioners, and stakeholders utilizing NLP in biomedical and clinical document analysis.

查看译文

关键词

Knowledge extraction,Natural language processing,Contextual word embedding models,Biomedical text,Electronic health records

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要