With the development of multimedia image recognition technologies, which allows extracting and analyzing large amounts of multimedia information from video and audio sources, there has been a large increase in the use of machine learning technology using deep learning to solve various problems. Speech emotion recognition (or classification) is one of the most complex topics in data science. In this work, we used an MLP classifier-based architecture that extracts chalk-frequency cepstral coefficients, chromograms, chalk-scale spectrograms from audio files and uses these as input to a neural network for emotion identification using samples from the Ryerson Audio-Visual Emotional Speech and Song (RAVDESS). A neural network model was developed to recognize four emotions (calm, anger, fear, disgust). This model classifies speech emotions with an accuracy of 83.33%.
RECOGNITION OF SPEECH EMOTIONS USING MACHINE LEARNING
Published June 2022
273
287
Abstract
Language
Русский
How to Cite
[1]
Yeralkhanova А., Yessenbay М., Mukhtarova А., Zhexebay Д. and Kozhagulov Е. 2022. RECOGNITION OF SPEECH EMOTIONS USING MACHINE LEARNING. Bulletin of Abai KazNPU. Series of Physical and mathematical sciences. 78, 2 (Jun. 2022), 102–108. DOI:https://doi.org/10.51889/2022-2.1728-7901.13.