谷歌浏览器插件
订阅小程序
在清言上使用

Study on reference-based FASTQ genome sequences compression.

International Conference on Bioinformatics and Intelligent Computing (BIC)(2022)

引用 0|浏览0
暂无评分
摘要
As the cost of genome sequencing decreases, the large amount of genomic data generated brings the storage problem of this massive data. We still have a lot of work to do in the field of specialized data compression of FASTQ files. This paper aims to explore a reference-based lossless compression algorithm for genome sequences in FASTQ format. We propose a compression scheme based on longest matching by using FMD-index to support exact match searching. At the same time, the reverse complementary sequence is used and the insertion, deletion and replacement operations are described effectively to further improve the compression ratio. In comparison with the experimental results of five compressors on seven sets of genome data, the proposed algorithm significantly improves the FASTQ file compression ratios, and is competitive in running time.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要