Experimental Assessment of Reversibility-Aware Deep Reinforcement Learning for Optical Data Center Network Reconfiguration

2023 International Conference on Optical Network Design and Modeling (ONDM)(2023)

引用 0|浏览28
暂无评分
摘要
The performance of communication-intensive distributed machine learning (DML) workloads and other emerging applications can suffer from a traffic-topology mismatch in traditional data-center networks. This degradation can be alleviated by performing a logical network topology reconfiguration. However, how to dynamically reconfigure the logical topology and steer the bandwidth efficiently with a control plane capable of efficiently adapting to the current data center traffic patterns without considerable overhead is still an open question. This paper presents a reversibility-aware deep reinforcement learning algorithm (RA-DRL) for optical switch reconfiguration in data center networks and validates it in an experimental testbed. Using our testbed, we show that appropriate optical-switch reconfiguration, driven both by a baseline DRL and an RA-DRL method, can improve the training performance of DML workloads under network congestion. More importantly, by incorporating the concept of reversibility in the training of the DRL agent, we demonstrate a 5x training-time decrease for a distributed computer-vision application and an improvement in convergence time by up to 64%.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要