5. Overcoming challenges in semantic alignment of therapeutics knowledge using TheraPy

Cancer Genetics(2023)

引用 0|浏览11
暂无评分
摘要
Genomic medicine pipelines incorporate knowledge extracted from an increasing number of publicly available databases. Unfortunately, integration of data about drugs and other therapeutics remains challenging, as these concepts are often ambiguous and inconsistently described. Semantic alignment of this knowledge would enable richer genomic annotation and clinical decision support, but is hampered by incongruities regarding data structure and scope between knowledge sources. Individual therapeutics may be associated with a variety of related concepts, from developmental identifiers and IUPAC chemical structure names to active ingredients or brand names. Ambiguities are compounded by differing approaches to classification and structure employed by knowledge resources. For example, while many sources curate knowledge at a single conceptual level, like a `chemical' (e.g. ChEMBL, ChemIDplus) or a `ligand' (e.g. IUPHAR's Guide to Pharmacology), others abstract therapeutics to a `drug' or `drug product' (DrugBank), and only specific products are designated as legally prescribable by regulators (as recorded in sources like Drugs@FDA). Finally, few sources provide semantically-rich linkages, and those that do are often scoped to specific medical subfields (e.g. NCI Thesaurus). TheraPy, our open-source drug normalization tool, performs concept alignment using resource-provided cross-references and active ingredient annotations. This enables concept unification across disparate conceptual levels (e.g. chemical to drug product), bypassing limitations of domain-specific vocabularies and conceptual references that hinder approaches based on string-matching alone. We review how our aggregation approach across 16,000 distinct concepts from 9 resources enables a high-performance therapeutic concept normalization API. Genomic medicine pipelines incorporate knowledge extracted from an increasing number of publicly available databases. Unfortunately, integration of data about drugs and other therapeutics remains challenging, as these concepts are often ambiguous and inconsistently described. Semantic alignment of this knowledge would enable richer genomic annotation and clinical decision support, but is hampered by incongruities regarding data structure and scope between knowledge sources. Individual therapeutics may be associated with a variety of related concepts, from developmental identifiers and IUPAC chemical structure names to active ingredients or brand names. Ambiguities are compounded by differing approaches to classification and structure employed by knowledge resources. For example, while many sources curate knowledge at a single conceptual level, like a `chemical' (e.g. ChEMBL, ChemIDplus) or a `ligand' (e.g. IUPHAR's Guide to Pharmacology), others abstract therapeutics to a `drug' or `drug product' (DrugBank), and only specific products are designated as legally prescribable by regulators (as recorded in sources like Drugs@FDA). Finally, few sources provide semantically-rich linkages, and those that do are often scoped to specific medical subfields (e.g. NCI Thesaurus). TheraPy, our open-source drug normalization tool, performs concept alignment using resource-provided cross-references and active ingredient annotations. This enables concept unification across disparate conceptual levels (e.g. chemical to drug product), bypassing limitations of domain-specific vocabularies and conceptual references that hinder approaches based on string-matching alone. We review how our aggregation approach across 16,000 distinct concepts from 9 resources enables a high-performance therapeutic concept normalization API.
更多
查看译文
关键词
therapeutics knowledge,semantic alignment,therapy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要