Skip to main content Skip to main navigation menu Skip to site footer
Bulletin of Abai KazNPU. Series of Physical and mathematical sciences

END-TO-END SPEECH SYNTHESIS FOR THE KAZAKH LANGUAGE

Published September 2022

200

76

Zh. Kozhirbayev+
National Laboratory Astana, Nur-Sultan
Zh. Yessenbayev+
National Laboratory Astana, Nur-Sultan
National Laboratory Astana, Nur-Sultan
National Laboratory Astana, Nur-Sultan
Abstract

Speech synthesis, also called text-to-speech (TTS), is considered one of the important tasks of speech processing along with speech recognition. It is a way of converting given text to speech. There are several approaches to speech synthesis. In the 20th century, the first computer voice synthesis system was developed. Some of the early computer speech synthesis methods are articulatory synthesis, formant synthesis, and concatenative synthesis. Statistical parametric speech synthesis was later proposed as machine learning developed. Since the 2010s, neural network-based speech synthesis has gradually become more popular and improves voice quality. The purpose of this work is to review statistical parametric and end-to-end methods, which can be considered as a line of evolutionary development of TTS. In addition, we will experiment with an end-to-end method based on Tacotron2 and ParalleWavegan. For the experiments, textual materials from the works of Akhmet Baitursynuly were collected. In total, 50 hours of audio recording were recorded from the collected materials. From Baitursynuly's works, six books were selected, from which the most common works were selected and collected in audio text materials. One professional male announcer voiced the collected text data.

pdf (Русский)
Language

Русский

How to Cite

[1]
Kozhirbayev Ж. and Yessenbayev Ж. 2022. END-TO-END SPEECH SYNTHESIS FOR THE KAZAKH LANGUAGE. Bulletin of Abai KazNPU. Series of Physical and mathematical sciences. 79, 3 (Sep. 2022), 196–203. DOI:https://doi.org/10.51889/9340.2022.21.68.023.