Chrome Extension
WeChat Mini Program
Use on ChatGLM

OA08.04 Validation of Scalable, Automated Data Extraction in an Advanced Lung Cancer Patient Population

Journal of Thoracic Oncology(2021)

Cited 0|Views10
No score
Abstract
Manual extraction from electronic health records (EHRs) is currently the standard approach for accessing real-world healthcare data but can be time consuming and challenging to maintain over time. Automated data extraction using natural language processing (NLP) is emerging as a viable method of data extraction from structured and unstructured fields of EHRs. While speed of NLP-based data extraction is established, some question the validity of the extracted data. This study compares the accuracy of, and concordance between, manual and NLP-extracted data from EHRs of patients with advanced lung cancer (aLC). EHRs of 1209 patients with aLC were screened using the AI engine, DARWEN™, to identify a subset of 333 patients diagnosed and treated with systemic therapy at Princess Margaret Cancer Centre in Toronto between January 2015 and December 2017. Full feature models were run on all 333 patients to extract data from EHRs, from which 100 patients were randomly selected for manual data extraction by two trained abstractors to validate against NLP-extracted data. An expert adjudicator reviewed inconsistencies between manual and NLP-extracted results and was referenced as the gold standard when calculating accuracy and concordance. NLP-extracted data from EHRs proved to be accurate and concordant with manual extraction methods (Table 1). Features with lower syntactic and semantic variation such as patient demographics (i.e., age and sex), characteristics (i.e., histologic subtype and comorbid conditions), and treatment details were reported with high accuracy and concordance. These tend to be the cases where manual reviewers would agree. Conversely, features with richer syntactic and semantic variation requiring deeper clinical interpretation had slightly lower accuracy by NLP extraction and, typically, manual review. By nature of the varying ways that biomarker testing and reporting is documented, extracting this data can be challenging. While NLP detection of biomarker testing was highly accurate and concordant, detection of results was more variable. NLP out-performed manual extraction in identifying metastatic sites with the exception of lung and lymph node metastases, which was due to analogous terms used in radiology reports that were not applied to variable definitions used to train DARWEN™.TableAccuracy and concordance between manual and NLP data extraction.Accuracy (%)Concordance (%)NLPManualDate of birth1009999Sex100100100Date of Stage IV diagnosis (+/- 30 days)94.083.077.0ECOG PS at Stage IV diagnosis93.078.071.0Smoking status88.094.082.0Histologic subtype98.098.096.0First line treatment type95.0-99.096.0-10092.0-99.0Treatment type (Any line)94.0-99.084.0-98.083.0-96.0Biomarker Testing Performed98.0-99.097.0-10096.0-98.0Biomarker Status (Positive or Negative)86.2-10094.7-10086.2-100Metastatic Sites of Disease66.0-99.071.0-10058.0-99.0Immunosuppressive medications80.0-10086.0-10076.0-100Comorbidities96.0-10096.0-10093.0-100 Open table in a new tab The use of NLP technology in oncology provides opportunity for real-world evidence studies at a larger scale than ever before. NLP was not only faster than manual extraction but, for many features, was also more accurate than a traditional manual approach, demonstrating the advances of modern NLP techniques as a scalable alternative to manual extraction.
More
Translated text
Key words
automated data extraction,lung cancer,validation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined