Genozip Dual-Coordinate VCF format enables efficient genomic analyses and alleviates liftover limitations

biorxiv(2023)

Cited 0|Views14
No score
Abstract
We introduce Dual Coordinate VCF (DVCF), a file format that records genomic variants against two different reference genomes simultaneously and is fully compliant with the current VCF specification. As implemented in the Genozip platform, DVCF enables bioinformatics pipelines to seamlessly operate across two coordinate systems by leveraging the system most advantageous to each pipeline step, simplifying bioinformatics workflows and reducing file generation and associated data storage burden. Moreover, our benchmarking of Genozip DVCF shows that it produces more complete, less erroneous, and less biased translations across coordinate systems than two widely used alternative tools (i.e., LiftoverVcf and CrossMap). Availability and Implementation An open source (GPL) version of Genozip containing DVCF functionality but not compression functionality, and which includes scripts for reproducing the benchmarks presented here, is available at . Documentation is available at . ### Competing Interest Statement The authors have declared no competing interest.
More
Translated text
Key words
efficient genomic analyses,liftover limitations,dual-coordinate
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined