Classification of RNA-Sequencing Data Via Poisson and Negative Binomial Linear Discriminant Analyses: A Methodological Study

Turkiye Klinikleri Journal of Biostatistics(2023)

Cited 0|Views2
No score
Abstract
Objective: Microarray and RNA sequencing (RNASeq) technologies are frequently employed in genetic data analysis for detecting disease-associated genes, identifying cancer subtypes, and enabling molecular diagnosis. While numerous methods have been proposed for classification problems using microarray data, there is a paucity of developed methods for classifying RNA-Seq data. This study aims to compare the performance of novel methods developed for RNA-Seq data on 3 distinct real-life datasets. Material and Methods: Cervical cancer, Alzheimer's disease, and kidney cancer RNA-Seq data were utilized in this study. The data were divided into training and test sets in a %70 and %30 ratio, respectively. Various preprocessing steps, such as normalization, power transformation, and variance filtering, were applied to the data. The Poisson Linear Discriminant Analysis (PLDA) and Negative Binomial Linear Discriminant Analysis (NBLDA) models were used for classification purposes, and the predictive performances of these models were compared. Results: Among the three datasets, the Alzheimer's data exhibited the lowest level of dispersion, while the cervical cancer data had the highest overdispersion. The NBLDA model demonstrated superior classification performance compared to the PLDA model. In cases of mild-to-moderate overdispersion, the predictive performance of the PLDA model improved when power transformation was applied, resulting in performance similar to that of the NBLDA model. Conclusion: PLDA and NBLDA models are two novel and promising techniques used in classifying RNA-Seq data. The performance of these models is influenced by the degree of overdispersion. In cases of high overdispersion, it is recommended to utilize the NBLDA model.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined