The role of natural language processing techniques versus conventional methods to gain ICI safety insights from unstructured EHR data.

Matthew S. Block, Hannah Barman, Sriram Venkateswaran, Antonio Santo, Unice Yoo,Eli Silvert,G. Scott Chandler,Tyler Wagner,Rajat Mohindra

JCO global oncology(2023)

引用 0|浏览2
暂无评分
摘要
136 Background: Studies using International Classification of Diseases (ICD) codes have aimed to characterize the safety profile of cancer therapies in the Real-World Data setting, but this approach has limitations including an inability to illustrate patient journey and offer insights into drug-Adverse Event (AE) causality. Using data from a cohort of ~9000 patients treated with immune checkpoint inhibitors (ICI) at Mayo Clinic, we recently demonstrated the feasibility of applying Augmented Curation (AC), a natural language processing (NLP)-based innovation, on unstructured electronic health record (EHR) data to detect and characterize immune related AEs (irAEs). While the ability to identify and extract clinical details from EHRs has been previously demonstrated, our approach focuses on applying these techniques to identify implied textual causality of drug-event pairs. This study aims to build upon our prior findings and compare the accuracy, speed, and resource savings enabled by AC analysis versus manual review of patient notes. Methods: We compared the prevalence of adrenal insufficiency, colitis, and hypophysitis (A.C.H) among ICI patients at Mayo Clinic using ICD codes and AC which leverages SciBERT to perform scientific NLP tasks. AC models trained to identify drug-adverse event relationships were used to assess irAEs and subsequent treatments and demonstrated strong performance with F1 scores of 0.85. Manual curation of relevant clinical data, i.e. medications and adverse events, was employed to create a “gold standard” A.C.H. cohort. Time required for manual review compared to AC was documented for overall comparison. Corticosteroid/immunosuppressant use and ICI discontinuations were used as proxies of severity. Results: In our cohort, 540 unique patients experienced at least one of the A.C.H. irAEs. A.C.H patients were found to receive corticosteroids for their respective irAE 79% of the time and discontinued the ICI due to the irAE 7.7% of the time. 6.1% of Colitis patients were treated with a 2L immunosuppressant. Similar to the previous study, AC showed higher sensitivity, PPV and NPV compared to ICD codes alone within the A.C.H. cohort. Sentence extraction and entity classification using AC models on all 9,000-patients took ~10 minutes, whereas manual review of this cohort would take roughly 9 weeks (~10 minutes per patient). Conclusions: The results from this study reinforce that AC models are a valuable tool to comprehensively extract information and to impute relationships between entities, such as therapies and AEs. The AC models demonstrated better accuracy at a much greater speed compared to manual review of patient EHRs. Leveraging AC models provides the ability to manipulate unstructured clinical text married to structured EHR data at scale while minimizing need for manual curation.
更多
查看译文
关键词
unstructured ehr data,natural language processing techniques,ici safety insights,natural language processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要