Aird-ComboComp: A combinable compressor framework with a dynamic-decider for lossy mass spectrometry data compression

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 1|浏览1
暂无评分
摘要
Abstract Mass spectrum (MS) data volumes increase with an improved ion acquisition ratio and a highly accurate mass spectrometer. However, the most widely used data format, mzML, does not take advantage of compression methods and improved read performances. Several compression algorithms have been proposed in recent years, and they consider a number of factors, including, numerical precision, metadata read strategies and the compression performance. Due to limited compression ratio, the high-throughput MS data format is still quite large. High bandwidth and memory requirements severely limit the applicability of MS data analysis in cloud and mobile computing. ComboComp is a comprehensive improvement to the Aird data format. Instead of using the general-purpose compressor directly, ComboComp uses two integer-purpose compressors and four general-purpose compressors, and obtains the best compression combination with a dynamic decider, achieving the most balanced compression ratio among all the numerous varieties of compressors. ComboComp supports a seamless extension of the new integer and generic compressors, making it an evolving compression framework. The improvement of compression rate and decoding speed greatly reduces the cost of data exchange and real-time decompression, and effectively reduces the hardware requirements of MS data analysis. Analyzing mass spectrum data on IoT devices can be useful in real-time quality control, decentralized analysis, collaborative auditing, and other scenarios. We tested ComboComp on 11 datasets generated by commonly used MS instruments. Compared with Aird-ZDPD, the compression size can be reduced by an average of 12.9%. The decompression speed is increased by an average of 27.1%. The average compression time is almost the same as that of ZDPD. The high compression rate and decoding speed make the Aird format effective for data analysis on small memory devices. This will enable MS data to be processed normally even on IoT devices in the future. We provide SDKs in three languages, Java, C# and Python, which offer optimized interfaces for the various acquisition modes. All the SDKs can be found on Github: https://github.com/CSi-Studio/Aird-SDK .
更多
查看译文
关键词
combinable compressor framework,mass spectrometry,aird-combocomp,dynamic-decider
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要