JSONize: A Scalable Machine Learning Pipeline to Model Medical Notes as Semi-structured Documents.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science(2020)

引用 0|浏览18
暂无评分
摘要
The Department of Veteran's Affairs (VA) archives one of the largest corpora of clinical notes in their corporate data warehouse as unstructured text data. Unstructured text easily supports keyword searches and regular expressions. Often these simple searches do not adequately support the complex searches that need to be performed on notes. For example, a researcher may want all notes with a Duke Treadmill Score of less than five or people that smoke more than one pack per day. Range queries like this and more can be supported by modelling text as semi-structured documents. In this paper, we implement a scalable machine learning pipeline that models plain medical text as useful semi-structured documents. We improve on existing models and achieve an F1-score of 0.912 and scale our methods to the entire VA corpus.
更多
查看译文
关键词
model medical notes,scalable machine learning pipeline,machine learning,semi-structured
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要