Skip to main content Skip to main navigation menu Skip to site footer

Уважаемые пользователи! На нашем хостинге ведутся технические работы, на сайте могут быть ошибки. Приносим свои изменения за временные неудобства.

Bulletin of the Abai KazNPU, the series of "Physical and Mathematical Sciences"

KEYWORD EXTRACTION FROM KAZAKH TEXT WITH MACHINE LEARNING ALGORITHMS

Published 03-2024
Khoja Ahmet Yassawi International Kazakh-Turkish University, Turkestan, Kazakhstan
Khoja Ahmet Yassawi International Kazakh-Turkish University, Turkestan, Kazakhstan
##plugins.generic.jatsParser.article.authorBio##
×

G.N. Kazbekova

Head of the Department of Computer Engineering

Khoja Ahmet Yassawi International Kazakh-Turkish University, Turkestan, Kazakhstan
Abstract

 Browsing information on the internet in daily life has become a common activity for computer users. Since thousands of Internet news are published on the Internet every day, it is difficult to effectively retrieve and summarize the relevant documents. Therefore, the keyword or keyphrase extraction technique is used to provide the main content of a particular web page.  Due to such needs, the use of keywords allows the reader to access the sought-after information easily and quickly. In this article, Random Forest and XgBoost (Extreme Gradient Boosting) algorithms, which are machine learning algorithms, were tested The results were obtained on the 500N-KPCrowd dataset, which consists of English-language news content widely used in the literature, and compared with the results obtained from the Kazakh language datasets. For the Kazakh data set, the highest result in the literature was achieved with the best F1 score of 0.97. For the 500N-KPCrowd data set, the best F1 score of 0.70 was obtained.

pdf
Language

Eng

How to Cite

[1]
Abibullayeva А., Kazbekova, G. and Zhunissov, N. 2024. KEYWORD EXTRACTION FROM KAZAKH TEXT WITH MACHINE LEARNING ALGORITHMS. Bulletin of the Abai KazNPU, the series of "Physical and Mathematical Sciences". 85, 1 (Mar. 2024), 106–113. DOI:https://doi.org/10.51889/2959-5894.2024.85.1.010.