Automatic spoken language identification using MFCC based time series features

MULTIMEDIA TOOLS AND APPLICATIONS(2022)

引用 12|浏览7
暂无评分
摘要
Spoken Language Identification (SLID) is a fairly well researched field. It has already been established as a significant first step in all multilingual speech recognition systems. With the rise in ASR technologies in recent years, the importance of SLID has become undeniable. In this work, we propose a model for the recognition of Indian and foreign languages. With the goal of making our model robust to noise from everyday life, we augment our data with noise of varying loudness taken from diverse environments. From the MFCC time series of this augmented data, we extract aggregated macro-level features, and perform feature selection using the FRESH (FeatuRe Extraction based on Scalable Hypothesis tests) algorithm. This helps us obtain a set of features that are relevant to this problem. This filtered set is used to train an Artificial Neural Network. The model is then tested on three standard datasets. Firstly, from the IIT-M IndicTTS speech database, six languages are selected, and an accuracy of 99.93% is obtained. Secondly, the IIIT-H Indic speech database consisting of seven languages is used, and an accuracy of 99.94% is recorded. Lastly, eight languages from the VoxForge dataset are also used, and we achieve an accuracy of 98.43%. The promising results obtained lead us to believe that these features are suitable for capturing language specific characteristics of speech. Hence, we propose that they can be used as standard features for the task of SLID. The source code of our present work can be found by accessing the link: https://github.com/rahamansaif/LID-using-time-series-MFCC .
更多
查看译文
关键词
Spoken language identification, Indian languages, Mel frequency cepstral coefficients, Time series features, Feature selection, FRESH algorithm, Artificial neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要