An End to End Bilingual TTS System for Fongbe and Yoruba

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This paper aims to present an end to end bilingual TTS system for Yoruba and Fongbe based on Fastspeech 2, a non-autoregressive model. From this baseline, a simple concatenation of speaker, language and phoneme embeddings was used as input for the encoder and the decoder. The training was done on a multi-speaker dataset collected for both languages. Two types of input were used: a shared representation of phoneme between both languages and a language specific representation of phonemes. Then some experimentations were made to test both input representations showing that results are smoother for the shared representation of phoneme. But with all input sets, the proposed model was able to synthesize speech in each language with voice cloning ability. The model produces good speech quality waveform with great fidelity and naturalness and shows its ability to generate speech waveforms for both languages. A comparison was also made between the proposed bilingual system and the same model trained on monolingual dataset to show that the bilingual dataset allows more accurate result.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By