Chrome Extension
WeChat Mini Program
Use on ChatGLM

Voting-Based Multiple Classification Approach for Turkish News Texts

2019 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU)(2019)

Cited 0|Views0
No score
Abstract
Nowadays, there are numerous sources on the internet that produce news on a daily basis. Through this growing knowledge base, it makes it difficult for users to access the information and news they are looking for. It is important to classify the information for fast and efficient search and access. In this study, a dataset consisting of Turkish news content Kemik prepared by Yıldız Technical University, Natural Language Processing Group, used. A hierarchical approach based on a voting structure is adopted by using machine learning based approaches. In order to solve the problem, firstly Tf-Idf method is applied for word 1-3- ngrams and character 2-6-ngrams. Thus, the 2000 dimensional feature vector is pre-trained. By using FastText, 300-dimensional feature vectors and 2 feature vectors are combined to produce 2300-dimensional feature vectors.. In order to determine the one that will increase the classification accuracy among these vectors, Support Vector Machines method is applied and Tf-Idf method which has the robust accuracy is determined as the main feature extraction method. Next, Support Vector Machines, K-Nearest Neighborhood Method, Random Forest, Logistic Regression, XGBoost methods are used for the classification of news texts. Estimated label values from all classifiers are voted for each sample and the label with the highest voting rate is considered as the final estimate. In this study, it is aimed to open the way to reach the right information quickly by classifying news topics. Finally, the feature vector size has been reduced using Principal Component Analysis and it is possible to gain processing speed without reducing performance. In both approaches, it is seen that the performance achieved by voting is higher than the individual performance rates of the classifiers.
More
Translated text
Key words
natural language processing,text classification,support vektör machines,majority voting,dimension reduction
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined