Apodotiko: Enabling Efficient Serverless Federated Learning in Heterogeneous Environments
arxiv(2024)
摘要
Federated Learning (FL) is an emerging machine learning paradigm that enables
the collaborative training of a shared global model across distributed clients
while keeping the data decentralized. Recent works on designing systems for
efficient FL have shown that utilizing serverless computing technologies,
particularly Function-as-a-Service (FaaS) for FL, can enhance resource
efficiency, reduce training costs, and alleviate the complex infrastructure
management burden on data holders. However, current serverless FL systems still
suffer from the presence of stragglers, i.e., slow clients that impede the
collaborative training process. While strategies aimed at mitigating stragglers
in these systems have been proposed, they overlook the diverse hardware
resource configurations among FL clients. To this end, we present Apodotiko, a
novel asynchronous training strategy designed for serverless FL. Our strategy
incorporates a scoring mechanism that evaluates each client's hardware capacity
and dataset size to intelligently prioritize and select clients for each
training round, thereby minimizing the effects of stragglers on system
performance. We comprehensively evaluate Apodotiko across diverse datasets,
considering a mix of CPU and GPU clients, and compare its performance against
five other FL training strategies. Results from our experiments demonstrate
that Apodotiko outperforms other FL training strategies, achieving an average
speedup of 2.75x and a maximum speedup of 7.03x. Furthermore, our strategy
significantly reduces cold starts by a factor of four on average, demonstrating
suitability in serverless environments.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要