Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders.

bioRxiv : the preprint server for biology(2024)

引用 0|浏览1
暂无评分
摘要
The coronavirus disease of 2019 (COVID-19) pandemic is characterized by sequential emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants and lineages outcompeting previously circulating ones because of, among other factors, increased transmissibility and immune escape1-3. We devised an unsupervised deep learning AutoEncoder for viral genomes anomaly detection to predict future dominant lineages (FDLs), i.e., lineages or sublineages comprising ≥10% of viral sequences added to the GISAID database on a given week4. The algorithm was trained and validated by assembling global and country-specific data sets from 16,187,950 Spike protein sequences sampled between December 24th, 2019, and November 8th, 2023. The AutoEncoder flags low frequency FDLs (0.01% - 3%), with median lead times of 4-16 weeks. Over time, positive predictive values oscillate, decreasing linearly with the number of unique sequences per data set, showing average performance up to 30 times better than baseline approaches. The B.1.617.2 vaccine reference strain was flagged as FDL when its frequency was only 0.01%, more than one year earlier of being considered for an updated COVID-19 vaccine. Our AutoEncoder, applicable in principle to any pathogen, also pinpoints specific mutations potentially linked to increased fitness, and may provide significant insights for the optimization of public health pre-emptive intervention strategies.
更多
查看译文
关键词
anomaly detection,sars-cov
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要