Parallel I/O Optimizations for Scalable Deep Learning
2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS)(2017)
Abstract
As deep learning systems continue to grow in importance, researchers have been analyzing approaches to make such systems efficient and scalable on high-performance computing platforms. As computational parallelism increases, however, data I/O becomes the major bottleneck limiting the overall system scalability. In this paper, we continue our efforts to improve LMDB, the I/O subsystem of the Caffe deep learning framework. In a previous paper we presented LMDBIO---an optimized I/O plugin for Caffe that takes into account the data access pattern of Caffe in order to vastly improve I/O performance. Nevertheless, LMDBIO's optimizations, which we henceforth call LMM (localized mmap), are limited to intranode performance, and these optimizations do little to minimize the I/O inefficiencies in distributed-memory environments. In this paper, we propose LMDBIO-DM, an enhanced version of LMDBIO-LMM that optimizes the I/O access of Caffe in distributed-memory environments. We present several sophisticated data I/O techniques that allow for significant improvement in such environments. Our experimental results show that LMDBIO-DM can improve the overall execution time of Caffe by more than 30-fold compared with LMDB and by 2-fold compared with LMDBIO-LMM.
MoreTranslated text
Key words
Scalable deep learning,Caffe,LMDB,I/O subsystem
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined