Identification of critical SARS-CoV-2 amino acids associated with COVID-19 hospitalization rate using machine learning and statistical modeling: An observational study in the United States

Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases(2023)

引用 0|浏览3
暂无评分
摘要
Background: The COVID-19 pandemic has put many medical systems on the verge of collapse in the last two years. Virus mutation was one of the important factors affecting the COVID-19 infection severity and hospital-izations. Although over ten thousand SARS-CoV-2 mutations being reported since the beginning of the COVID-19 pandemic, only a small percentage of mutations are likely to affect the virus phenotype and change its severity. Finding out which amino acids have the greatest impact on COVID-19 hospitalization rate is an important research question.Methods: This observational study used the COVID-19 case hospitalization ratio (CHR) to represent the virus severity related with hospitalization. The database is based on the daily state-level epidemiological and genomic sequential data in the United States from the Alpha wave to the first Omicron wave. The critical amino acids that mostly affected the CHR were determined by using four types of models including extreme gradient boosting decision trees (XGBoost), artificial neural networks (ANNs), logistic regression and Lasso regression models.Results: The XGBoost, ANN, logistic regression, and Lasso regression models all produce excellent results (mean square error for all state-level models does not exceed 0.0008 using the testing dataset). Based on the rank of importance of all covariates, the critical amino acids most affecting the CHR were identified, including T19, L24, P25, P26, A27, A67, H69, V70, T95, G142, V143, Y145, E156, F157, N211, L212, V213, R214, D215, G339, R346, S373, L452, S477, T478, E484, N501, A570, P681, and T716.Conclusion: This study identified critical amino acids that are most likely to affect the hospitalization rate, allowing public health workers to monitor these highly risky amino acids and raise an alarm immediately when more severe mutations occur. Furthermore, the methodology and results may be extended to other regions.
更多
查看译文
关键词
COVID-19,Case hospitalization ratio,SARS-CoV-2 amino acid mutation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要