FM3VCF: A Software Library for Accelerating the Loading of Large VCF Files in Genotype Data Analyses

Zhentao Zuo,Qi Li,Zhuo Li,Meng Huang, Tianyan You

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览0
暂无评分
摘要
Abstract Background The increasing size of genotype data has led to the loading of VCF files becoming a computational bottleneck in various analyses, including imputation and genome-wide association studies (GWAS). To address this issue, we developed a software library, FM3VCF (fast M3VCF), that utilizes multiple CPU threads to accelerate this process. Findings FM3VCF can convert VCF files into the exclusive data format of MINIMAC4[1], M3VCF[1], and efficiently read and parse data from VCF files. In comparison to m3vcftools[1], FM3VCF is approximately 20 times faster for compressing VCF files to M3VCF format. Furthermore, FM3VCF is approximately 3 times faster than HTSlib[2], including decompressing and parsing, for reading compressed VCF files. FM3VCF is written in C and is open-source, available for download from https://github.com/Oliver-111/m3vcf under the MIT/BSD license. Conclusion FM3VCF is a powerful tool for accelerating the loading of large VCF files in genotype data analyses. By fully utilizing multiple CPU threads, FM3VCF can significantly reduce the computational burden in various genomic analyses.
更多
查看译文
关键词
large fm3vcf files
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要