ADSP Whole Genome Sequencing (WGS) Release 3 data update from Genome Center for Alzheimer’s Disease

Alzheimer's & Dementia(2023)

引用 0|浏览11
暂无评分
摘要
Abstract Background The Genome Center for Alzheimer’s Disease (GCAD) coordinates the integration and meta‐analysis of all available Alzheimer’s disease (AD) relevant whole genome sequencing (WGS) data with the goal of identifying AD risk or protective genetic variants and eventual therapeutic targets. The WGS datasets are generated via the collaboration of scientists from the Alzheimer’s Disease Sequencing Project (ADSP) and GCAD. With the vision to minimize data heterogeneity, introduced by different sequencing protocols and machines, GCAD processes all samples using identical pipelines and performs quality assurance (QA) checks. Methods Raw sequencing data (FASTQs or BAMs) were aligned to GRCh38/hg38 by BWA, and variant calling and joint genotyping were done by GATK. Furthermore, Smoove, Manta and Streka were applied to generate structural variant (SV) calls per sample. QA checks including sex, contamination and genotype concordance as well as the ADSP QC protocol were performed to evaluate the quality of samples and variants. To facilitate the access and usage of the big joint‐genotyped VCF files, we introduced a compact version for storing variant info and sample genotypes only. Results We dropped 235 (1.3%) samples of poor coverage (<20x) or that failed QA checks, and we flagged 173 (1.0%) samples that were of borderline quality. As a result, the dataset (ADSP Release 3, 2021) includes 16,905 genomes from 17 diverse cohorts with 3 major ethnicities: 10,651 Non‐Hispanic Whites, 3,212 Hispanics and 2,874 African Americans. Data are deeply sequenced (average genome coverage: >30x). All samples’ CRAMs, gVCFs from GATK, and VCFs from the three SV callers were deposited into NIAGADS Data Sharing Service (DSS) ( https://dss.niagads.org/ ) for public distribution. In addition, joint‐genotype VCFs are available in both compact and QC versions. This joint‐genotype VCF contains >206M bi‐allelic single‐nucleotide variants, 16M bi‐allelic indels and 28M multi‐allelic variants, with 96% of variants remaining after stringent QC. Conclusion The ADSP and GCAD generate high quality genotype calls and SV calls. Currently the project is processing ∼37,000 WGS samples sequenced primarily through the ADSP Follow‐Up Study, which will contain a more ancestrally diverse set of populations. We anticipate this 2022 release will continue to benefit the research community studying AD genetics.
更多
查看译文
关键词
alzheimers,genome center,data update,sequencing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要