Efficient Data Distribution Estimation for Accelerated Federated Learning
CoRR(2024)
Abstract
Federated Learning(FL) is a privacy-preserving machine learning paradigm
where a global model is trained in-situ across a large number of distributed
edge devices. These systems are often comprised of millions of user devices and
only a subset of available devices can be used for training in each epoch.
Designing a device selection strategy is challenging, given that devices are
highly heterogeneous in both their system resources and training data. This
heterogeneity makes device selection very crucial for timely model convergence
and sufficient model accuracy. To tackle the FL client heterogeneity problem,
various client selection algorithms have been developed, showing promising
performance improvement in terms of model coverage and accuracy. In this work,
we study the overhead of client selection algorithms in a large scale FL
environment. Then we propose an efficient data distribution summary calculation
algorithm to reduce the overhead in a real-world large scale FL environment.
The evaluation shows that our proposed solution could achieve up to 30x
reduction in data summary time, and up to 360x reduction in clustering time.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined