Data Prefetching for Large Tiered Storage Systems

2017 IEEE International Conference on Data Mining (ICDM)(2017)

Cited 13|Views39
No score
Abstract
In multi-tier storage systems with large amounts of data, most of the data is stored on inexpensive slower tiers such as cloud or tape to achieve cost savings. This also implies that retrieving the data from the slower storage tiers incurs high latency. Therefore, it would be beneficial to proactively prefetch data from slower tiers to faster tiers by predicting future data accesses. State-of-the-art access prediction methods typically record access history of individual files, data objects, or data segments. However, in systems with large amounts of infrequently accessed (or cold) data, file-level access history is often unavailable for much of the data due to the low frequency of access. In this paper, we extract information from file metadata to predict file accesses in a storage system. The proposed method relies on the hypothesis that users and applications access data stored in the system in a given context and that the context and, therefore, the set of files that are likely to be accessed can be identified by detecting access patterns in file metadata. As an application, we consider the LOFAR radio telescope's long term archive, where the access patterns are learned based on a rich set of metadata, and these patterns are then used to make predictions as to likely future accesses by the astronomers.
More
Translated text
Key words
access prediction,caching,machine learning,archive,LOFAR
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined