Chrome Extension
WeChat Mini Program
Use on ChatGLM

Mdrt: Multi-Domain Synthetic Speech Localization.

IEEE International Conference on Acoustics, Speech, and Signal Processing(2024)

Cited 0|Views13
No score
Abstract
With recent advancements in generating synthetic speech, tools to generate high-quality synthetic speech impersonating any human speaker are easily available. Several incidents report misuse of high-quality synthetic speech for spreading misinformation and for large-scale financial frauds. Many methods have been proposed for detecting synthetic speech; however, there is limited work on localizing the synthetic segments within the speech signal. In this work, our goal is to localize the synthetic speech segments in a partially synthetic speech signal. Most existing methods for synthetic speech localization obtain features from either the time domain waveform or the spectrogram representation of the speech signal. In this work, we propose Multi-Domain ResNet Transformer (MDRT) that obtains multi-domain features from both the time domain and the spectrogram representation of a speech signal to localize synthetic speech segments. MDRT uses transformer neural networks to obtain multi-domain features and processes them using a ResNet-style neural network. We use the PartialSpoof dataset to examine the performance of MDRT on localizing synthetic speech segments of varying duration. Our results show that MDRT performs better than several existing synthetic speech localization methods.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined