Dividing out quantification uncertainty allows efficient assessment of differential transcript expression

biorxiv(2023)

引用 1|浏览28
暂无评分
摘要
A major challenge in the analysis of RNA-seq data at the transcript-level is accounting for the variability introduced during quantification of RNA sequencing reads. This variability is due to the high levels of sequence similarity among transcripts annotated to the same genomic locus and the mapping ambiguity resulting from the assignment of sequence reads to such transcripts. The quantification uncertainty associated with transcript-level estimated counts is intractable to measure analytically but represents an extra source of variation that seriously compromises differential transcript expression (DTE) analyses if standard statistical methods developed for gene-level analyses are used. Bootstrap counts, as provided by popular RNA-seq quantification tools, allow one to estimate the quantification uncertainty and account for such an effect in DTE analyses. We present catchSalmon and catchKallisto, two functions included in the R/Bioconductor package edgeR, that estimate the transcript-level quantification uncertainty, here termed mapping ambiguity overdispersion, using bootstrap counts. We discuss how the mapping ambiguity overdispersion can be effectively removed from the data in transcript-level analyses via count scaling, an approach that reduces the size of the estimated counts obtained from quantification tools to effective count sizes that reflect their true precision. The presented count scaling approach allows users to perform efficient DTE analyses within the efficient edgeR framework. A comprehensive simulation study and a DTE analysis of human lung adenocarcinoma cell lines are presented to illustrate the benefits of accounting for the mapping ambiguity overdispersion in transcript-level RNA-seq data analyses. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要