ИНТЕГРАЛЬНЫЙ (END-TO-END) СИНТЕЗ РЕЧИ ДЛЯ КАЗАХСКОГО ЯЗЫКА

Zh. Kozhirbayev; Zh. Yessenbayev

doi:10.51889/9340.2022.21.68.023

Vol. 79 No. 3 (2022)

END-TO-END SPEECH SYNTHESIS FOR THE KAZAKH LANGUAGE

Published September 2022

288

148

Zh. Kozhirbayev⁺⁻

National Laboratory Astana, Nur-Sultan

Zh. Yessenbayev⁺⁻

National Laboratory Astana, Nur-Sultan

DOI: 10.51889/9340.2022.21.68.023

Abstract

Speech synthesis, also called text-to-speech (TTS), is considered one of the important tasks of speech processing along with speech recognition. It is a way of converting given text to speech. There are several approaches to speech synthesis. In the 20th century, the first computer voice synthesis system was developed. Some of the early computer speech synthesis methods are articulatory synthesis, formant synthesis, and concatenative synthesis. Statistical parametric speech synthesis was later proposed as machine learning developed. Since the 2010s, neural network-based speech synthesis has gradually become more popular and improves voice quality. The purpose of this work is to review statistical parametric and end-to-end methods, which can be considered as a line of evolutionary development of TTS. In addition, we will experiment with an end-to-end method based on Tacotron2 and ParalleWavegan. For the experiments, textual materials from the works of Akhmet Baitursynuly were collected. In total, 50 hours of audio recording were recorded from the collected materials. From Baitursynuly's works, six books were selected, from which the most common works were selected and collected in audio text materials. One professional male announcer voiced the collected text data.

pdf (Русский)

Keywords

speech synthesis formant speech synthesis concatenative speech synthesis statistical parametric speech synthesis integral speech synthesis

Language

Русский

How to Cite

[1]

Kozhirbayev Ж. and Yessenbayev Ж. 2022. END-TO-END SPEECH SYNTHESIS FOR THE KAZAKH LANGUAGE. Bulletin of Abai KazNPU. Series of Physical and Mathematical sciences. 79, 3 (Sep. 2022), 196–203. DOI:https://doi.org/10.51889/9340.2022.21.68.023.

END-TO-END SPEECH SYNTHESIS FOR THE KAZAKH LANGUAGE

Download Citation