CHEAPS2AGA: Bounding Space Usage in Variance-Reduced Stochastic Gradient Descent over Streaming Data and Its Asynchronous Parallel Variants

international conference on algorithms and architectures for parallel processing(2020)

引用 0|浏览16
暂无评分
摘要
Stochastic Gradient Descent (SGD) is widely used to train a machine learning model over large datasets, yet its slow convergence rate can be a bottleneck. As a remarkable family of variance reduction techniques, memory algorithms such as SAG and SAGA have been proposed to accelerate the convergence rate of SGD. However, these algorithms need to store per training data point corrections in memory. The unlimited space usage feature is impractical for modern large-scale applications, especially over data points that arrive over time (referred to as streaming data in this paper). To overcome this weakness, this paper investigates the methods that bound the space usage in the state-of-the-art family of variance-reduced stochastic gradient descent over streaming data, and presents CHEAPS2AGA. At each step of updating the model, the key idea of CHEAPS2AGA is always reserving N random data points as samples, while re-using information about past stochastic gradients across all the observed data points with limited space usage. In addition, training an accurate model over streaming data requires the algorithm to be time-efficient. To accelerate the model training phase, CHEAPS2AGA embraces a lock-free data structure to insert new data points and remove unused data points in parallel, and updates the model parameters without using any locking. We conduct comprehensive experiments to compare CHEAPS2AGA to prior related algorithms suited for streaming data. The experimental results demonstrate the practical competitiveness of CHEAPS2AGA in terms of scalability and accuracy.
更多
查看译文
关键词
streaming data,bounding space usage,descent,variance-reduced
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要