Abstract 5464: Determining accuracy of RNA sequencing data for gene expression profiling of single samples

Holly C. Beale,Jacquelyn M. Roger,Matthew A. Cattle,Liam T. McKay,Katrina Learned,Geoff Lyle,Ellen T. Kephart,Rob Currie,Du Linh Lam,Lauren Sanders,Jacob Pfeil,John Vivian,Isabel Bjork,Sofie R. Salama,David Haussler,Olena M. Vaske

Clinical Research (Excluding Clinical Trials)（2020）

引用 0|浏览10

暂无评分

摘要

Abstract Gene expression analysis of single samples shows increasing promise for clinical applications. However, obtaining high quality RNA from a human tumor sample can be challenging because medical, surgical, and pathological requirements often lead to sparse or degraded RNA. The variability in RNA quality presents challenges for defining input sample requirements, which are required to calculate sensitivity, specificity and reference ranges as required for a Clinical Laboratory Improvement Amendments (CLIA)-approved test. Clinical analysis of a single RNA-Seq dataset for the purpose of gene expression profiling involves not only the patient's sample, but a comparison cohort. We use 12,236 total tumor samples and require at least 20 samples for within-disease comparisons. Many of these samples do not have associated metadata about the quality of the sample, and so we have prioritized quality measures that can be derived from the sequence data alone. In order to characterize variability present in RNA-Seq datasets, we analyzed paired-end Illumina RNA sequencing (RNA-Seq) data from 1088 tumor samples from 29 data providers. We categorized reads based on where and how well they map to the genome, as well as by their PCR duplicate status. We defined reference ranges for five types of reads found in sequencing data: unmapped (0-13%); multi-mapped (2-15%); mapped duplicate (2-66%); mapped non exonic (0-26%) and mapped, exonic, non-duplicate (MEND, 27-76%). Only 64% of the 1088 tumor samples had read type fractions within the reference ranges. Of the remainder, most exceeded the reference ranges of more than one type of read. We then measured the relationship of sensitivity and specificity to input MEND read depth. We subsampled 5 deeply sequenced samples. With each subsample, we identified exceptionally highly expressed genes and samples with similar gene expression profiles. With subsampling to 20 million MEND reads, we detected over-expressed genes (“up-outlier” genes) with a median sensitivity of 96.1% and specificity of 99.8%; sample similarity had 96.6% sensitivity and 100.0% specificity. We estimate that a sample sequenced to a depth of 70 million total reads will typically have sufficient data for the up-outlier and sample-similarity gene expression analysis assays described here. With this analysis, we have identified a conservative approach to measuring the quality of RNA-Seq read data, which can then be used to define the sensitivity and specificity of single-sample assays to support their ultimate clinical adoption. Citation Format: Holly C. Beale, Jacquelyn M. Roger, Matthew A. Cattle, Liam T. McKay, Katrina Learned, Geoff Lyle, Ellen T. Kephart, Rob Currie, Du Linh Lam, Lauren Sanders, Jacob Pfeil, John Vivian, Isabel Bjork, Sofie R. Salama, David Haussler, Olena M. Vaske. Determining accuracy of RNA sequencing data for gene expression profiling of single samples [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 5464.

查看译文

关键词

gene expression profiling,gene expression,rna,single samples

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要