Quality Control in Remote Speech Data Collection

IEEE Journal of Selected Topics in Signal Processing(2019)

引用 4|浏览68
暂无评分
摘要
There is the need for algorithms that can automatically control the quality of the remotely collected speech databases by detecting potential outliers which deserve further investigation. In this paper, a simple and effective approach for identification of outliers in a speech database is proposed. Using the deterministic minimum covariance determinant (DetMCD) algorithm to estimate the mean and covariance of the speech data in the mel-frequency cepstral domain, this approach identifies potential outliers based on the statistical distance of the observations in the feature space from the central location of the data that are larger than a predefined threshold. The DetMCD is a computationally efficient algorithm which provides a highly robust estimate of the mean and covariance in multivariate data even when 50% of the data are outliers. Experimental results using 8 different speech databases with manually inserted outliers show the effectiveness of the proposed method for outlier detection in speech databases. Moreover, applying the proposed method to a remotely collected Parkinsonu0027s voice database shows that the outliers that are part of the database are detected with 97.4% accuracy, resulting in significantly decreasing the effort required for manually controlling the quality of the database.
更多
查看译文
关键词
Databases,Signal processing algorithms,Data collection,Speech recognition,Speech processing,Approximation algorithms,Quality control
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要