Toward a Dynamic Threshold for Quality Score Distortion in Reference-Based Alignment.

JOURNAL OF COMPUTATIONAL BIOLOGY(2020)

引用 4|浏览8
暂无评分
摘要
The intrinsic high-entropy sequence metadata, known as quality scores, is largely the cause of the substantial size of sequence data files. Yet, there is no consensus on a viable reduction of the resolution of the quality score scale, arguably because of collateral side effects. In this article, we leverage on the penalty functions of HISAT2 aligner to rebin the quality score scale in such a way as to avoid any impact on sequence alignment, identifying alongside a distortion threshold for "safe" quality score representation. We tested our findings on whole-genome and RNA-seq data, and contrasted the results with three methods for lossy compression of the quality scores.
更多
查看译文
关键词
lossy compression,quality scores,sequence alignment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要