Model-Parallel Model Selection for Deep Learning Systems
International Conference on Management of Data(2021)
摘要
ABSTRACTAs deep learning becomes more expensive, both in terms of time and compute, inefficiencies in machine learning training prevent practical usage of state-of-the-art models for most users. The newest model architectures are simply too large to be fit onto a single processor. To address the issue, many ML practitioners have turned to model parallelism as a method of distributing the computational requirements across several devices. Unfortunately, the sequential nature of neural networks causes very low efficiency and device utilization in model parallel training jobs. We propose a new form of "shard parallelism" combining task parallelism and model parallelism, and package it into a framework we name Hydra. Hydra recasts the problem of model parallelism in the multi-model context to produce a fine-grained parallel workload of independent model shards, rather than independent models. This new parallel design promises dramatic speedups relative to the traditional model parallelism paradigm.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要