Cluster Analysis of Low-Dimensional Medical Concept Representations from Electronic Health Records.
International Conference on Health Information Science (HIS)(2022)
摘要
The study of existing links among different types of medical concepts can support research on optimal pathways for the treatment of human diseases. Here, we present a clustering analysis of medical concept learned representations generated from MIMIC-IV, an open dataset of de-identified digital health records. Patient’s trajectory information were extracted in chronological order to generate +500k sequence-like data structures, which were fed to a word2vec model to automatically learn concept representations. As a result, we obtained concept embeddings that describe diagnostics, procedures, and medications in a continuous low-dimensional space. A quantitative evaluation of the embeddings shows the significant power of the extracted embeddings on predicting exact labels of diagnoses, procedures, and medications for a given patient trajectory, achieving top-10 and top-30 accuracy over 47% and 66%, respectively, for all the dimensions evaluated. Moreover, clustering analyses of medical concepts after dimensionality reduction with t-SNE and UMAP techniques show that similar diagnoses (and procedures) are grouped together matching the categories of ICD-10 codes. However, the distribution by categories is not as evident if PCA or SVD are employed, indicating that the relationships among concepts are highly non-linear. This highlights the importance of non-linear models, such as those provided by deep learning, to capture the complex relationships of medical concepts.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要