Chrome Extension
WeChat Mini Program
Use on ChatGLM

Data-Driven Rate Control for RDMA Networks: A Lightweight Online Learning Approach

2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS)(2023)

Cited 0|Views11
No score
Abstract
Link speed in datacenter networks (DCNs) keeps growing rapidly, inducing an increasingly large portion of network flows to become short flows which can be finished within one round-trip time (RTT). This phenomenon makes many existing congestion control schemes ineffective because they iteratively adjust the sending rate based on the latest congestion feedback in multiple rounds. We find that the representative DCQCN scheme for RDMA exhibits substantial performance degradation when there are many short flows, and this is specially true in High Performance Computing (HPC) scenarios where most of Message Passing Interface (MPI) messages are small. In this paper, we propose a data-driven rate control framework which can learn from long-term online data about past rate control decisions via a lightweight online learning technique named Multi-Armed Bandit (MAB) which has a provable performance guarantee. Utilizing the framework, we devise a rate control scheme named Dolce-RC, which dynamically controls the rate increase and reduction by learning from online data. We implement Dolce-RC in commodity smart NICs, and show via testbed experiments and large-scale simulations that compared to DCQCN, Dolce-RC reduces average completion time of MPI messages by up to 68%, while not requiring any modification to switches.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined