Preliminary Performance Analysis Of Distributed Dnn Training With Relaxed Synchronization

IEICE TRANSACTIONS ON ELECTRONICS(2021)

引用 0|浏览0
暂无评分
摘要
Scalability of distributed DNN training can be limited by slowdown of specific processes due to unexpected hardware failures. We propose a dynamic process exclusion technique so that training throughput is maximized. Our evaluation using 32 processes with ResNet-50 shows that our proposed technique reduces slowdown by 12.5% to 50% without accuracy loss through excluding the slow processes.
更多
查看译文
关键词
relaxed synchronization, dynamic performance optimization, distributed deep learning, approximate computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要