Chrome Extension
WeChat Mini Program
Use on ChatGLM

Context Based Compression Of Fastq Data

2016 IEEE International Symposium on Circuits and Systems (ISCAS)(2016)

Cited 0|Views15
No score
Abstract
With advances in Next Generation Sequencing (NGS) technologies, the amount of genomic data produced is growing exponentially. Efficient compression is therefore vital for archiving, retrieval and transfer of raw sequencing data. NGS data consisting of sequence information along with the associated quality scores and sequence identifiers is stored in the FASTQ format. In this paper, we present lossless methdologies to compress components of a FASTQ file. The proposed system explores prediction by partial matching methods (PPM) for building higher order context models to compress FASTQ data using adaptive arithmetic coding (AAC). We analyze crucial parameters in AAC to further improve overall compression of our system. We compare the performance of the proposed system with existing benchmarks for a sample data set of 6 standard FASTQ files. The proposed method provides gains of up to 15% compared to the best existing benchmark. We also show that the proposed methodology provides gains on overall storage space across the sample data set.
More
Translated text
Key words
next generation sequencing technologies,genomic data compression,raw sequencing data transfer,raw sequencing data retrieval,raw sequencing data archiving,NGS data,sequence information,quality scores,sequence identifiers,FASTQ format,lossless methdologies,FASTQ file,partial matching methods,PPM,FASTQ data compression,adaptive arithmetic coding,AAC,storage space,context based compression
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined