High tissue-specificity of lncRNAs maximises the prediction of tissue of origin of circulating DNA

biorxiv(2023)

Cited 0|Views13
No score
Abstract
Several studies have made it possible to envision a translational application of plasma DNA sequencing in cancer diagnosis and monitoring. However, the extremely low concentration of circulating tumour DNA (ctDNA) fragments among the total cell-free DNA (cfDNA) remains a formidable challenge to overcome and statistical models have yet to be improved enough to become of practical use. In this study, we set about appraising the predictive value of a variety of binary classification models based on cfDNA sequencing using fragmentation features extracted around transcription start sites (TSSs). We investigated (1) features summarising mapped fragment density around each TSS, (2) long non-coding RNA (lncRNA) genes versus coding genes and (3) selection criteria to generate gene classes to be assigned by the model. Given that, in healthy samples, most of the cfDNA comes from lymphomyeloid lineages, we could identify the model parametrisation with the best accuracy in those lineages using publicly available datasets of healthy patients' cfDNA. Our results show that (1) the way tissue-specific gene classes are defined matters more than what fragmentation features are included, and (2) in particular, lncRNAs are more tissue specific than coding genes and stand out in terms of both sensitivity and specificity in our results. ### Competing Interest Statement The authors have declared no competing interest.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined