Distributed Deep Learning On Heterogeneous Computing Resources Using Gossip Communication

LARGE-SCALE SCIENTIFIC COMPUTING (LSSC 2019)(2020)

Cited 2|Views4
No score
Abstract
With the increased usage of deep neural networks, their structures have naturally evolved, increasing in size and complexity. With currently used networks often containing millions of parameters, and hundreds of layers, there have been many attempts to leverage the capabilities of various high-performance computing architectures. Most approaches are focused on either using parameter servers or a fixed communication network, or exploiting particular capabilities of specific computational resources. However, few experiments have been made under relaxed communication consistency requirements and using a dynamic adaptive way of exchanging information.Gossip communication is a peer-to-peer communication approach, that can minimize the overall data traffic between computational agents, by providing a weaker guarantee on data consistency - eventual consistency. In this paper, we present a framework for gossip-based communication, suitable for heterogeneous computing resources, and apply it to the problem of parallel deep learning, using artificial neural networks. We present different approaches to gossip-based communication in a heterogeneous computing environment, consisting of CPUs and MIC-based co-processors, and implement gossiping via both shared and distributed memory. We also provide a simplistic approach to load balancing in a heterogeneous computing environment, that proves efficient for the case of parallel deep neural network training.Further, we explore several approaches to communication exchange and resource allocation, when considering parallel deep learning using heterogeneous computing resources, and evaluate their effect on the convergence of the distributed neural network.
More
Translated text
Key words
Deep learning, Gossip communication, Heterogeneous high-performance computing
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined