Discovering and overcoming the bias in neoantigen identification by unified machine learning models

Ziting Zhang, Wenxu Wu,Lei Wei,Xiaowo Wang

biorxiv(2024)

引用 0|浏览8
暂无评分
摘要
Neoantigens, formed by genetic mutations in tumor cells, are abnormal peptides that can trigger immune responses. Precisely identifying neoantigens from vast mutations is the key to tumor immunotherapy design. There are three main steps in the neoantigen immune process, i.e., binding with MHCs, extracellular presentation, and induction of immunogenicity. Various machine learning methods have been developed to predict the probability of one of the three events, but the overall accuracy of neoantigen identification remains far from satisfactory. To gain a systematic understanding of the key factors of neoantigen identification, we developed a unified transformer-based machine learning framework ImmuBPI that comprised three tasks and achieved state-of-the-art performance. Through cross-task model interpretation, we have discovered an underestimation of data bias for immunogenicity prediction, which has led to skewed discriminatory boundaries of current machine learning models. We designed a mutual information-based debiasing strategy that performed well on mutation variants immunogenicity prediction, a task where current methods fell short. Clustering immunogenic peptides with debiased representations uncovers unique preferences for biophysical properties, such as hydrophobicity and polarity. These observations serve as an important complement to the past understanding that accurately predicting neoantigen is constrained by limited data, highlighting the necessity of bias control. We expect this study will provide novel and insightful perspectives for neoantigen prediction methods and benefit future neoantigen-mediated immunotherapy designs. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要