IDENTIFYING AND ANALYZING FEATURES FOR THE CLASSIFICATION OF NEWS

I.M. Ualiyeva; R.R. Mussabayev

doi:10.51889/2959-5894.2023.81.1.020

Vol. 81 No. 1 (2023)

IDENTIFYING AND ANALYZING FEATURES FOR THE CLASSIFICATION OF NEWS

Published March 2023

213

121

I.M. Ualiyeva⁺⁻

al-Farabi Kazakh National University, Almaty

R.R. Mussabayev ⁺⁻

The Institute of Information and Computational Technologies, Almaty

al-Farabi Kazakh National University, Almaty

The Institute of Information and Computational Technologies, Almaty

DOI: 10.51889/2959-5894.2023.81.1.020

Abstract

The number of documents, including online news, that requires a deeper understanding and analysis grows every year. Machine Learning algorithms help us to classify texts accurately. However, finding suitable structures and techniques for text, including feature extraction, is difficult for researchers. This paper addresses the task of identi-fying and analyzing features to distinguish different genres of texts. We studied the main characteristics of each genre of news text like news, articles, interviews, and blogs to obtain more informative features. We have built our data set by collecting texts from open-access official information portals. Analysis of our data set and features that look at structural complexity, detail, and imaginative details in a text are helpful to distinguish our dataset. In par-ticular, we use complexity (lexical diversity, lexical density, punctuation, average sentence length, number of personal pronouns, readability index), detail features (number of proper nouns in the text, numbers, month-related words), imaginative features (PoS tags, words-quantifiers, plural nouns) features. Our results suggest that our features provide effective representation to distinguish news texts from articles, blogs/opinions, and interviews with high accuracy.

pdf

Keywords

Text Categorization, Text Mining, Feature Selection, Text Classification, Online News Classification

Language

English

How to Cite

[1]

Ualiyeva, I. and Mussabayev , R. 2023. IDENTIFYING AND ANALYZING FEATURES FOR THE CLASSIFICATION OF NEWS. Bulletin of Abai KazNPU. Series of Physical and Mathematical sciences. 81, 1 (Mar. 2023), 178–185. DOI:https://doi.org/10.51889/2959-5894.2023.81.1.020.

IDENTIFYING AND ANALYZING FEATURES FOR THE CLASSIFICATION OF NEWS

Download Citation