Explainable Stuttering Recognition Using Axial Attention.

Yu Ma, Yuting Huang, Kaixiang Yuan, Guangzhe Xuan,Yongzi Yu, Hengrui Zhong,Rui Li,Jian Shen,Kun Qian,Bin Hu,Björn W. Schuller,Yoshiharu Yamamoto

ICIC (3)(2023)

引用 0|浏览18
暂无评分
摘要
Stuttering is a complex speech disorder that disrupts the flow of speech, and recognizing persons who stutter (PWS) and understanding their significant struggles is crucial. With advancements in computer vision, deep neural networks offer potential for recognizing stuttering events through image-based features. In this paper, we extract image features of Wavelet Transformation (WT) and Histograms of Oriented Gradient (HOG) from audio signals. We also generate explainable images using Gradient-weighted Class Activation Mapping (Grad-CAM) as input for our final recognition model–an axial attention-based EfficientNetV2, which is trained on the Kassel State of Fluency Dataset (KSoF) to perform 8 classes recognition. Our experimental results achieved a relative percentage increase in unweighted average recall (UAR) of 4.4% compared to the baseline of ComParE 2022 , demonstrating that the axial attention-based EfficientNetV2, combined with the explainable input, has the capability to detect and recognise multiple types of stuttering.
更多
查看译文
关键词
explainable stuttering recognition,axial attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要