Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy

IEEE/ACM Transactions on Audio, Speech, and Language Processing(2021)

引用 3|浏览12
暂无评分
摘要
A recently introduced, effective feature extraction technique for computational paralinguistics is that of Bag-of-Audio-Words (BoAW), where we cluster the frame-level training vectors, and represent each speech utterance based on the cluster of its frames. Over the past few years, several improvements have been proposed for the original BoAW approach, but none of them has examined the impact of the stochastic nature of the clustering step. In this study we demonstrate experimentally that the random factor present in the BoAW clustering step is indeed propagated into the next classification step, eventually leading to suboptimal classification performance. As a solution, we propose to train an ensemble of classifiers; that is, we repeat the BoAW codebook selection step several times, train separate classifier models for these BoAW representation versions and combine their predictions. Our results, obtained for three different paralinguistic datasets, demonstrate that this ensemble technique makes the whole paralinguistic classification process more robust, and it leads to improvements in the classification performance. We tested this technique on three different paralinguistic datasets, and achieved the highest Unweighted Average Recall score reported so far on the iHEARu-EAT corpus.
更多
查看译文
关键词
Computational paralinguistics,classification,Bag-of-Audio-Words representation,ensemble learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要