Shared Differential Expression-Based Distance Reflects Global Cell Type Relationships in Single-Cell RNA Sequencing Data

JOURNAL OF COMPUTATIONAL BIOLOGY(2022)

引用 0|浏览5
暂无评分
摘要
Unsupervised cell clustering on the basis of meaningful biological variation in single-cell RNA sequencing (scRNA seq) data has received significant attention, as it assists with ontological subpopulation identification among the data. A key step in the clustering process is to compute distances between the cells under a specified distance measure. Although particular distance measures may successfully separate cells into biologically relevant clusters, they may fail to retain global structure of the data, such as relative similarity between the cell clusters. In this article, we modify a biologically motivated distance measure, SIDEseq, for use of aggregate comparisons of cell types in large single-cell assays, and demonstrate that, across simulated and real scRNA seq data, the distance matrix more consistently retains global cell type relationships than commonly used distance measures for scRNA seq clustering. We call the modified distance measure "SIDEREF." We explore spectral dimension reduction of the SIDEREF distance matrix as a means of noise filtering, similar to principal components analysis applied directly to expression data. We utilize a summary measure of relative cell type distances to better display the cell group relationships. SIDEREF visualizations more consistently reflect global structures in the data than other commonly considered distance measures. We utilize relative cell type distances and the SIDEREF distance measure to uncover compositional differences between annotated leukocyte cell groups in a compendium of Mus musculus scRNA seq assays comprising 12 tissues. SIDEREF and associated analysis is openly available on GitHub.
更多
查看译文
关键词
clustering, differential expression, distance, global structure, scRNA seq
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要