Browsing information on the internet in daily life has become a common activity for computer users. Since thousands of Internet news are published on the Internet every day, it is difficult to effectively retrieve and summarize the relevant documents. Therefore, the keyword or keyphrase extraction technique is used to provide the main content of a particular web page. Due to such needs, the use of keywords allows the reader to access the sought-after information easily and quickly. In this article, Random Forest and XgBoost (Extreme Gradient Boosting) algorithms, which are machine learning algorithms, were tested The results were obtained on the 500N-KPCrowd dataset, which consists of English-language news content widely used in the literature, and compared with the results obtained from the Kazakh language datasets. For the Kazakh data set, the highest result in the literature was achieved with the best F1 score of 0.97. For the 500N-KPCrowd data set, the best F1 score of 0.70 was obtained.
KEYWORD EXTRACTION FROM KAZAKH TEXT WITH MACHINE LEARNING ALGORITHMS
Published March 2024
161
93
Abstract
Language
English
How to Cite
[1]
Abibullayeva А., Kazbekova, G. and Zhunissov, N. 2024. KEYWORD EXTRACTION FROM KAZAKH TEXT WITH MACHINE LEARNING ALGORITHMS. Bulletin of Abai KazNPU. Series of Physical and mathematical sciences. 85, 1 (Mar. 2024), 106–113. DOI:https://doi.org/10.51889/2959-5894.2024.85.1.010.