An End to End Bilingual TTS System for Fongbe and Yoruba

dc.contributor.authorBOCO, Charbel Arnaud Cedrique Y.
dc.contributor.authorDAGBA, THÉOPHILE KOMLAN
dc.date.accessioned2026-06-02T16:06:57Z
dc.date.available2026-06-02T16:06:57Z
dc.date.issued2022
dc.description.abstractThis paper aims to present an end to end bilingual TTS system for Yoruba and Fongbe based on Fastspeech 2, a non-autoregressive model. From this baseline, a simple concatenation of speaker, language and phoneme embeddings was used as input for the encoder and the decoder. The training was done on a multi-speaker dataset collected for both languages. Two types of input were used: a shared representation of phoneme between both languages and a language specific representation of phonemes. Then some experimentations were made to test both input representations showing that results are smoother for the shared representation of phoneme. But with all input sets, the proposed model was able to synthesize speech in each language with voice cloning ability. The model produces good speech quality waveform with great fidelity and naturalness and shows its ability to generate speech waveforms for both languages. A comparison was also made between the proposed bilingual system and the same model trained on monolingual dataset to show that the bilingual dataset allows more accurate result.
dc.identifier.doi10.1007/978-3-031-16210-7
dc.identifier.otherBECDB-12043
dc.identifier.urihttps://dspace.uac.bj/handle/123456789/10417
dc.language.isofr
dc.relation.ispartofAdvances in Computational Collective Intelligence
dc.subjectBilingual text-to-speech · African language · Tonal language
dc.titleAn End to End Bilingual TTS System for Fongbe and Yoruba
dc.typeArticle

Files

Collections