Applying virtual sample generation and ensemble modeling for improving the spectral diagnosis of cancer

Hui Chen,Chao Tan,Zan Lin, Maoxian Chen, Bin Cheng

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy(2024)

Cited 0|Views4
No score
Abstract
Cancer diagnosis plays a key role in facilitating treatment and improving survival rates of patients. The combination of near-infrared (NIR) spectroscopy with data-driven algorithms offers a rapid and cost-effective approach for such a task. Due to the limitations of objective cases, the number of tumor samples is usually smaller, and the resulting dataset exhibit the issues of class imbalance, which has a more serious impact on the performance of diagnostic models. To deal with class imbalance and improve the sensitivity, this work investigates the feasibility of NIR spectroscopy combined with virtual sample generation (VSG) as well as ensemble strategy for developing diagnostic models. Based on preliminary experiment, several learning algorithms such as discriminant analysis (DA) and partial least square-discriminant analysis (PLS-DA) are screened out as algorithms for constructing prediction models. Three algorithms of VSG including synthetic minority oversampling technique (SMOTE), Borderline-SMOTE and adaptive synthetic sampling (ADASYN) are used for experiment. A fixed sample subset composed of 27 cancer samples and 54 normal samples are hold out as the test set. Three training sets containing 5, 10, 25 minority class samples and 54 majority class samples are used for model development. The experimental result indicates that overall, with PLS-DA algorithm, all VSG approaches can significantly improve the sensitivity of cancer diagnosis for all cases of training sets with different minority samples, but ADASYN performs the best. It reveals that the integration of NIR, PLS-DA, and ADASYN is a promising tool package for developing diagnosis methods.
More
Translated text
Key words
Cancer,Diagnosis,Virtual sample generation,Ensemble,NIR spectroscopy
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined