The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients

medRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览14
暂无评分
摘要
Computational methods for identifying gene-disease associations can use both genomic and phenotypic information to prioritize genes and variants that may be associated with genetic diseases. Phenotype-based methods commonly rely on comparing phenotypes observed in a patient with a database of genotype-to-phenotype associations using a measure of semantic similarity, and are primarily limited by the quality and completeness of this database as well as the quality of phenotypes assigned to a patient. Genotype-to-phenotype associations used by these methods are largely derived from literature and coded using phenotype ontologies. Large Language Models (LLMs) have been trained on large amounts of text and have shown their potential to answer complex questions across multiple domains. Here, we demonstrate that LLMs can prioritize disease-associated genes as well, or better than, dedicated bioinformatics methods relying on calculated phenotype similarity. The LLMs use only natural language information as background knowledge and do not require ontology-based phenotyping or structured genotype-to-phenotype knowledge. We use a cohort of undiagnosed patients with rare diseases and show that LLMs can be used to provide diagnostic support that helps in identifying plausible candidate genes. ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement This work has been supported by funding from King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/4355-01-01, URF/1/4675-01-01, URF/1/4697-01-01, URF/1/5041-01-01, REI/1/5334-01-01, FCC/1/1976-46-01, and FCC/1/1976-34-01. This work was supported by the SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence (SDAIA-KAUST AI). ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: Approval: This study was approved by the Institutional Bioethics Committee (IBEC) at King Abdullah University of Science and Technology under approval numbers 18IBEC10 and 22IBEC069, and the Institutional Review Board (IRB) at King Saud University under approval number 18/0093/IRB. Compliance: All methods were carried out in accordance with the guidelines and regulations laid out by the institutional bioethics committees, the Declaration of Helsinki, and applicable laws and regulations governing research involving human subjects. Informed consent: Informed consent was obtained from all participants or their legal guardians. I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable. Yes All data produced in the present work are contained in the manuscript and methods to produce all data are available online at https://github.com/bio-ontology-research-group/LLM_GenePrioritization
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要