File Access Characteristics of Deep Learning Workloads and Cache-Friendly Data Management

2023 10th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI)(2023)

引用 0|浏览2
暂无评分
摘要
Recently, the data size of deep learning grows rapidly, and accessing file data can significantly degrade deep learning performance. To quantify this, we analyze the file access characteristics of deep learning workloads and find out that they are quite different from those of traditional workloads. Specifically, in the deep learning’s training process, all file data are randomly accessed, and due to this feature, it is difficult to improve the performance of file access through caching. To cope with this situation, we present a cache-friendly file data management policy for accelerating data access in deep learning. Unlike conventional deep learning training processes that shuffle all datasets every epoch, our policy defines a shuffling unit called bundle, and improves the spatial locality of file access without compromising the model’s training efficiency. We also improve the temporal locality of data access by arranging bundles in an alternating order for each epoch. Experimental results show that our data management policy improves the miss ratio of file cache by 17.0%, and the execution time of training by 24.7% when accessing file data in deep learning.
更多
查看译文
关键词
File access,deep learning,file cache,dataset,data management
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要