谷歌浏览器插件
订阅小程序
在清言上使用

OCRFinder: a noise-tolerance machine learning method for accurately estimating open chromatin regions

Jiayi Ren, Yuqian Liu, Xiaoyan Zhu, Xuwen Wang, Yifei Li, Yuxin Liu, Wenqing Hu, Xuanping Zhang, Jiayin Wang

Frontiers in genetics(2023)

引用 0|浏览28
暂无评分
摘要
Open chromatin regions are the genomic regions associated with basic cellular physiological activities, while chromatin accessibility is reported to affect gene expressions and functions. A basic computational problem is to efficiently estimate open chromatin regions, which could facilitate both genomic and epigenetic studies. Currently, ATAC-seq and cfDNA-seq (plasma cell-free DNA sequencing) are two popular strategies to detect OCRs. As cfDNA-seq can obtain more biomarkers in one round of sequencing, it is considered more effective and convenient. However, in processing cfDNA-seq data, due to the dynamically variable chromatin accessibility, it is quite difficult to obtain the training data with pure OCRs or non-OCRs, and leads to a noise problem for either feature-based approaches or learning-based approaches. In this paper, we propose a learning-based OCR estimation approach with a noise-tolerance design. The proposed approach, named OCRFinder, incorporates the ideas of ensemble learning framework and semi-supervised strategy to avoid potential overfitting of noisy labels, which are the false positives on OCRs and non-OCRs. Compared to different noise control strategies and state-of-the-art approaches, OCRFinder achieved higher accuracies and sensitivities in the experiments. In addition, OCRFinder also has an excellent performance in ATAC-seq or DNase-seq comparison experiments.
更多
查看译文
关键词
cell-free DNA,cfDNA,open chromatin region,noisy label learning,chromatin accessibility,sequencing data analyses
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要