Predicting gene expression from plasma cell-free DNA using both the fragment length and fragment position

Cancer Research(2019)

引用 0|浏览15
暂无评分
摘要
The ability to use a blood sample to determine the transcriptional state of cells that are releasing DNA into the bloodstream of a patient may be helpful in a variety of clinical applications. Here we present a case study of a gene expression prediction model that uses cell-free DNA (cfDNA) fragment coverage data generated by high-throughput sequencing to predict which genes are highly or lowly expressed in the cells contributing to that cfDNA. We evaluated a number of models, including a convolutional neural network that takes cfDNA fragment information (the density of both fragment midpoint and length by genomic position) over a transcription start site (TSS) as input, and outputs a predicted probability of whether that gene is highly expressed in cfDNA-producing cells. When we trained the convolutional model on a set of 554 genes with TSSs that were either constitutively expressed or unexpressed across leukocyte samples from the NIH Roadmap Epigenome Mapping Consortium, we achieved ~0.97 AUC in cross validation. With other models and splits of the data, we observed AUCs ranging from 0.95 to 0.99 on this gene-expression task. Next, we were interested in whether this trained model could answer specific clinical questions. For example, we hypothesized that we should see an increased influence of colon gene expression profiles in colorectal cancer patients with a higher fraction of circulating tumor DNA. To test this hypothesis, we applied our models to a set of genes with colon-specific expression, which generated a list of probabilities of each gene being expressed in each sample. We then applied simple models on the these lists of probabilities to predict whether a patient had CRC or was healthy. This yielded cross validation AUCs between 0.85 and 0.95 across many of the models we tested in differentiating healthy patients from colorectal cancer patients with tumor fraction over 5%. These results suggest a path forward for modeling transcriptional states using cfDNA sequencing data, which will enable greater insights from cfDNA that could augment those provided by other analytes. Citation Format: John A. St John, Erik Gafni, Brandon White, Ajay Kannan, Loren Hansen, Artur Jaroszewicz, Anshul Kundaje, Nathan Boley. Predicting gene expression from plasma cell-free DNA using both the fragment length and fragment position [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 4349.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要