Surgicberta: a pre-trained language model for procedural surgical language

INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS(2023)

引用 0|浏览5
暂无评分
摘要
Pre-trained language models are now ubiquitous in natural language processing, being successfully applied for many different tasks and in several real-world applications. However, even though there is a wealth of high-quality written materials on surgery, and the scientific community has shown a growing interest in the application of natural language processing techniques in surgery, a pre-trained language model specific to the surgical domain is still missing. The creation and public release of such a model would serve numerous useful clinical applications. For example, it could enhance existing surgical knowledge bases employed for task automation, or assist medical students in summarizing complex surgical descriptions. For this reason, in this paper, we introduce SurgicBERTa , a pre-trained language model specific for the English surgical language, i.e., the language used in the surgical domain. SurgicBERTa has been obtained from RoBERTa through continued pre-training with the Masked language modeling objective on 300 k sentences taken from English surgical books and papers, for a total of 7 million words. By publicly releasing SurgicBERTa , we make available a resource built from the content collected in many high-quality surgical books, online textual resources, and academic papers. We performed several assessments in order to evaluate SurgicBERTa , comparing it with the general domain RoBERTa . First, we intrinsically assessed the model in terms of perplexity, accuracy, and evaluation loss resulting from the continual training according to the masked language modeling task. Then, we extrinsically evaluated SurgicBERTa on several downstream tasks, namely (i) procedural sentence detection, (ii) procedural knowledge extraction, (iii) ontological information discovery, and (iv) surgical terminology acquisition. Finally, we conducted some qualitative analysis on SurgicBERTa , showing that it contains a lot of surgical knowledge that could be useful to enrich existing state-of-the-art surgical knowledge bases or to extract surgical knowledge. All the assessments show that SurgicBERTa better deals with surgical language than a general-purpose pre-trained language model such as RoBERTa , and therefore can be effectively exploited in many computer-assisted applications in the surgical domain.
更多
查看译文
关键词
Transformers,Language models,Natural language processing,Medicine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要