Uncertainty quantification of reference based cellular deconvolution algorithms

Epigenetics(2022)

引用 0|浏览67
暂无评分
摘要
The majority of epigenetic epidemiology studies to date have generated genome-wide profiles from bulk tissues (e.g. whole blood) however these are vulnerable to confounding from variation in cellular composition. Proxies for cellular composition can be mathematically derived from the bulk tissue profiles using a deconvolution algorithm however, there is no method to assess the validity of these estimates for a dataset where the true cellular proportions are unknown. In this study, we describe, validate and characterise a sample level accuracy metric for derived cellular heterogeneity variables. The CETYGO score captures the deviation between a sample’s DNAm profile and its expected profile given the estimated cellular proportions and cell type reference profiles.We demonstrate that the CETYGO score consistently distinguishes inaccurate and incomplete deconvolutions when applied to reconstructed whole blood profiles. By applying our novel metric to > 6,300 empirical whole blood profiles, we find that estimating accurate cellular composition is influenced by both technical and biological variation. In particular, we show that when using the standard reference panel for whole blood, less accurate estimates are generated for females, neonates, older individuals and smokers. Our results highlight the utility of a metric to assess the accuracy of cellular deconvolution, and describe how it can enhance studies of DNA methylation that are reliant on statistical proxies for cellular heterogeneity. To facilitate incorporating our methodology into existing pipelines, we have made it freely available as an R package (). ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
DNA methylation,epigenetic epidemiology,illumina 450K array,Illumina EPIC array,cellular heterogeneity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要