Preliminary Performance Analysis Of Distributed Dnn Training With Relaxed Synchronization
IEICE TRANSACTIONS ON ELECTRONICS(2021)
摘要
Scalability of distributed DNN training can be limited by slowdown of specific processes due to unexpected hardware failures. We propose a dynamic process exclusion technique so that training throughput is maximized. Our evaluation using 32 processes with ResNet-50 shows that our proposed technique reduces slowdown by 12.5% to 50% without accuracy loss through excluding the slow processes.
更多查看译文
关键词
relaxed synchronization, dynamic performance optimization, distributed deep learning, approximate computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要