Toward a Personalized Clustered Federated Learning: A Speech Recognition Case Study

IEEE INTERNET OF THINGS JOURNAL(2023)

引用 0|浏览8
暂无评分
摘要
Most speech recognition systems utilize cloud computing for model training and updates. Speech data, being personally identifiable information (PII), encompasses personal, privacy-sensitive, and regulated content. Relying on centralized servers or third parties can threaten confidential data, resulting in privacy breaches. Therefore, privacy issues and strict regulations (e.g., EU's general data protection regulation, California's CCPA, and the Privacy Act in Australia) limit the availability of large data sets. The scarcity of data sets is particularly pronounced in less-represented languages, like Persian, adversely impacting innovation and data-driven product development. To overcome the challenges posed by the scarcity of data sets and privacy concerns, for the first time, we propose a novel federated learning (FL) solution for Persian Spoken Isolated Digit Recognition. This proposed technique bridges the gap between privacy and utility by enabling the training of an algorithm using decentralized data sets stored on edge devices or servers, without the need for data exchange. Nonindependent and identically distributed data (non-IID), such as unique speaker accents, poses a challenge in speech recognition, especially in an FL setup. Regrettably, this challenge has largely been overlooked in existing techniques and methodologies. To address this, we present an innovative personalized clustered FL (PCFL) approach that successfully exploits similarities among the private data distributions and captures distinctive characteristics inherent in each client's data in order to train models. The experimental results show that while the proposed solution significantly addresses privacy concerns, it has a negligible performance loss compared to centralized model training techniques.
更多
查看译文
关键词
Artificial intelligence (AI),federated learning (FL),Internet of Things (IoT),personalization,privacy-preserving machine learning (PPML),speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要