A system for phenotype harmonization in the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program
bioRxiv (Cold Spring Harbor Laboratory)(2021)
摘要
Genotype-phenotype association studies often combine phenotype data from multiple studies to increase power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data sharing mechanisms. This system was developed for the National Heart, Lung and Blood Institute’s Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other omics data for >80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants from up to 17 TOPMed studies per phenotype. We discuss the challenges faced in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled-access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include ([1][1]) the code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify or extend these harmonizations to additional studies; and ([2][2]) results of labeling thousands of phenotype variables with controlled vocabulary terms.
### Competing Interest Statement
Adrienne Stilp receives funding from Seven Bridges Genomics to develop tools for the NHLBI BioData Catalyst consortium. Bruce Psaty serves on the Steering Committee of the Yale Open Data Access Project funded by Johnson & Johnson. Pradeep Natarajan received grant support from Amgen, Apple, and Boston Scientific, and consulting fees from Apple, all unrelated to the present work. Stella Aslibekyan is currently employed by and holds equity in 23andMe, Inc. May Montasser receives funding from Regeneron Pharmaceutical Inc. unrelated to this work.
* DCC
: Data Coordinating Center
dbGaP
: database for Genotypes and Phenotypes
JSON
: JavaScript Object Notation
LDL-C
: Low-density lipoprotein cholesterol
NHLBI
: National Heart, Lung, and Blood Institute
NIH
: National Institutes of Health
QC
: quality control
TOPMed
: Trans-Omics for Precision Medicine
UMLS
: Unified Medical Language System
WG
: Working Group
[1]: #ref-1
[2]: #ref-2
更多查看译文
关键词
phenotype harmonization,precision medicine,trans-omics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要