Chrome Extension
WeChat Mini Program
Use on ChatGLM

CAM: Uninteresting Speech Detector

Weiyi Lu, Yi Xu, Peng Yang, Belinda Zeng

INTERSPEECH(2020)

Cited 1|Views19
No score
Abstract
Voice assistants such as Siri, Alexa, etc. usually adopt a pipeline to process users' utterances, which generally include transcribing the audio into text, understanding the text, and finally responding back to users. One potential issue is that some utterances could be devoid of any interesting speech, and are thus not worth being processed through the entire pipeline. Examples of uninteresting utterances include those that have too much noise, are devoid of intelligible speech, etc. It is therefore desirable to have a model to filter out such useless utterances before they are ingested for downstream processing, thus saving system resources. Towards this end, we propose the Combination of Audio and Metadata (CAM) detector to identify utterances that contain only uninteresting speech. Our experimental results show that the CAM detector considerably outperforms using either an audio model or a metadata model alone, which demonstrates the effectiveness of the proposed system.
More
Translated text
Key words
audio event detection,acoustic scene classification
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined