Weakly supervised identification and generation of adaptive immune receptor sequences associated with immune disease status

biorxiv(2023)

引用 0|浏览9
暂无评分
摘要
Adaptive immune receptor (AIR) repertoires carry immune signals as sequence motif imprints of past and present encounters with antigen (immune status). Machine learning (ML)-based identification and generation of antigen-specific immune receptors is potentially immense value for public health. The ideal training data for such ML tasks would be AIR datasets, where each sequence is labeled with its cognate antigen. However, given current technological constraints, sequence-labeled datasets are scarce, contrasted by an abundance of repertoire-labeled ones – AIR repertoire datasets where only the repertoire dataset, but not the individual AIRs, are labeled. Therefore, an unmet need exists for an ML approach that enables predictive identification and generation of disease-specific novel AIR sequences using exclusively repertoire-level immune status information. To address this need, we developed AIRRTM, an end-to-end generative model using an encoder-decoder architecture and Topic Modeling (TM) that requires exclusively repertoire-labeled AIR sequencing data as input. We validated AIRRTM’s capacity to identify and generate novel disease-associated receptors on several ground truth synthetic datasets of increasingly complex immune signals and experimental data. AIRRTM broadens the discovery space for immunotherapeutics by enabling the exploitation of large-scale and broadly available immune repertoire data previously deemed largely unsuitable for this task. ### Competing Interest Statement V.G. declares advisory board positions in aiNET GmbH, Enpicom B.V, Absci, Omniscope, and Diagonal Therapeutics. V.G. is a consultant for Adaptyv Biosystems, Specifica Inc, Roche/Genentech, immunai, Proteinea and LabGenius.
更多
查看译文
关键词
adaptive immune receptor sequences,weakly
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要