Keyphrase Identification Using Minimal Labeled Data with Hierarchical Context and Transfer Learning

medRxiv : the preprint server for health sciences(2023)

引用 0|浏览38
暂无评分
摘要
Interoperable clinical decision support system (CDSS) rules are a pathway to achieving interoperability which is a well-recognized challenge in health information technology. Building an ontology facilitates the creation of interoperable CDSS rules, which can be achieved by identifying the keyphrases (KP) from the existing literature. However, KP identification for labeling the data requires human expertise, consensus, and contextual understanding. This paper aims to present a semi-supervised framework for the CDSS using minimal labeled data based on hierarchical attention over the documents fused with domain adaptation approaches. Then, evaluate the effectiveness of KP identification with this framework. In the view of semi-supervised learning, our methodology toward building this framework outperforms the prior neural architectures by learning with document-level context, no explicit hand-crafted features, knowledge transfer from pre-trained models (on unlabeled corpus), and post-fine-tuning with smaller gold standard-labeled data. To the best of our knowledge, this is the first functional framework for the CDSS sub-domain to identify the KP, which is trained on limited labeled data. It contributes to the general natural language processing (NLP) architectures in areas such as clinical NLP, where manual data labeling is challenging. ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement The work was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R01GM138589 and partially under P20GM121342. ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable. Yes CDSS Data is retrieved from PubMed in MEDLINE format and all the research articles in CDSS sub-domain with valid PMID were retained. The data is freely-available to download with valid credentials to PubMed APIs. * NLP : Natural language processing CDSS : Clinical decision support system HDE : Human domain expert BiLSTM : Bidirectional long short-term memory BiLM : Bidirectional language model CRF : Conditional random field GS : Gold standard KP : Keyphrase
更多
查看译文
关键词
minimal labeled data,transfer learning,hierarchical context
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要