Re-enabling high-speed caching for LSM-trees.

arXiv: Data Structures and Algorithms(2016)

引用 22|浏览93
暂无评分
摘要
LSM-tree has been widely used in cloud computing systems by Google, Facebook, and Amazon, to achieve high performance for write-intensive workloads. However, in LSM-tree, random key-value queries can experience long latency and low throughput due to the interference from the compaction, a basic operation in the algorithm, to caching. LSM-tree relies on frequent compaction operations to merge data into a sorted structure. After a compaction, the original data are reorganized and written to other locations on the disk. As a result, the cached data are invalidated since their referencing addresses are changed, causing serious performance degradations. propose dLSM in order to re-enable high-speed caching during intensive writes. dLSM is an LSM-tree with a compaction buffer on the disk, working as a cushion to minimize the cache invalidation caused by compactions. The compaction buffer maintains a series of snapshots of the frequently compacted data, which represent a consistent view of the corresponding data in the underlying LSM-tree. Being updated in a much lower rate than that of compactions, data in the compaction buffer are almost stationary. In dLSM, an object is referenced by the disk address of the corresponding block either in the compaction buffer for frequently compacted data, or in the underlying LSM-tree for infrequently compacted data. Thus, hot objects can be effectively kept in the cache without harmful invalidations. With the help of a small on-disk compaction buffer, dLSM achieves a high query performance by enabling effective caching, while retaining all merits of LSM-tree for write-intensive data processing. We have implemented dLSM based on LevelDB. Our evaluations show that with a standard DRAM cache, dLSM can achieve 5--8x performance improvement over LSM with the same cache on HDD storage.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要