Exploring a large cancer cell line RNA-sequencing dataset with k-mers

biorxiv(2024)

引用 0|浏览4
暂无评分
摘要
Analyzing the immense diversity of RNA isoforms in large RNA-seq repositories requires laborious data processing using specialized tools. Indexing techniques based on k-mers have previously been effective at searching for RNA sequences across thousands of RNA-seq libraries but falling short of enabling direct RNA quantification. We show here that RNAs queried in the form of k-mer sets can be quantified in seconds, with a precision akin to that of conventional RNA quantification methods. We showcase several applications by exploring an index of the Cancer Cell Line Encyclopedia (CCLE) collection consisting of 1019 RNA-seq samples. Non-reference RNA sequences such as RNAs harboring driver mutations and fusions, splicing isoforms or RNAs derived from repetitive elements, can be retrieved with high accuracy. Moreover, we show that k-mer indexing offers a powerful means to reveal variant RNAs induced by specific gene alterations, for instance in splicing factors. A web server allows public queries in CCLE and other indexes: https://transipedia.fr. Code is provided to allow users to set up their own server from any RNA-seq dataset. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要