Advancing Accessibility: Voice Cloning and Speech Synthesis for Individuals with Speech Disorders
CoRR(2024)
摘要
Neural Text-to-speech (TTS) synthesis is a powerful technology that can
generate speech using neural networks. One of the most remarkable features of
TTS synthesis is its capability to produce speech in the voice of different
speakers. This paper introduces voice cloning and speech synthesis
https://pypi.org/project/voice-cloning/ an open-source python package for
helping speech disorders to communicate more effectively as well as for
professionals seeking to integrate voice cloning or speech synthesis
capabilities into their projects. This package aims to generate synthetic
speech that sounds like the natural voice of an individual, but it does not
replace the natural human voice. The architecture of the system comprises a
speaker verification system, a synthesizer, a vocoder, and noise reduction.
Speaker verification system trained on a varied set of speakers to achieve
optimal generalization performance without relying on transcriptions.
Synthesizer is trained using both audio and transcriptions that generate Mel
spectrogram from a text and vocoder which converts the generated Mel
Spectrogram into corresponding audio signal. Then the audio signal is processed
by a noise reduction algorithm to eliminate unwanted noise and enhance speech
clarity. The performance of synthesized speech from seen and unseen speakers
are then evaluated using subjective and objective evaluation such as Mean
Opinion Score (MOS), Gross Pitch Error (GPE), and Spectral distortion (SD). The
model can create speech in distinct voices by including speaker characteristics
that are chosen randomly.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要