Quick Generation of SSD Performance Models Using Machine Learning

IEEE Transactions on Emerging Topics in Computing(2021)

引用 2|浏览17
暂无评分
摘要
Increasing usage of Solid-State Drives (SSDs) has greatly boosted the performance of storage backends. SSDs perform many internal processes such as out-of-place writes, wear-leveling, and garbage collection. These operations are complex and not well documented which make it difficult to create accurate SSD simulators. Our survey indicates that aside from complex configuration, available SSD simulators do not support both sync and discard requests. Past performance models also ignore the long term effect of I/O requests on SSD performance, which has been demonstrated to be significant. In this article, we utilize a methodology based on machine learning that extracts history-aware features at low cost to train SSD performance models that predict request response times. A key goal of our work is to achieve real-time or near-real time feature extraction and to achieve practical training times so our work can be considered as part of solutions that perform online or periodical characterization such as adaptive storage algorithms. Thus, we extract features from individual read, write, sync , and discard I/O requests and use structures such as exponentially decaying counters to track past activity using $O(1)$ memory and processing cost. To make our methodology accessible and usable in real-world online scenarios, we focus on machine learning models that can be trained quickly on a single machine. To massively reduce processing and memory cost, we utilize feature selection to reduce feature count by up to 63%, allowing a feature extraction rate of 313,000 requests per second using a single thread. Our dataset contains 580M requests taken from 35 workloads. We experiment with three families of machine learning models, a) decision trees, b) ensemble methods utilizing decision trees, and c) Feedforward Neural Networks (FNN). Based on these experiments, FNN achieves an average $R^2$ score of 0.72 compared to 0.61 and 0.45 for the Random Forest and Bagging, respectively, where $R^2 \in (-\inf, 1)$ of 1 indicates a perfect fit. However, while the random forest model has lower accuracy, it uses general processing hardware and can be trained much faster, making it viable for use in online scenarios.
更多
查看译文
关键词
Performance prediction,solid state drives,machine learning,neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要