An Improvement of Shuffled Frog Leaping Algorithm with a Decision Tree for Feature Selection in Text Document Classification

An Improvement of Shuffled Frog Leaping Algorithm with a Decision Tree for Feature Selection in Text Document Classification

Mostafa Mahmoudi, Farhad Soleimanian Gharehchopogh

Abstract

Given the growth of textual documents, the classification of documents is crucial for reducing the complexity of information and easy and quick access to them. Classification is usually carried out through extraction of keywords, sentences, and matching the paragraphs. The major method for finding similarities in the texts is using keywords based on word frequency. The word count is done through various methods such as TF, and then a specific weight is attributed to each word. The main challenge in Text Document Classification (TDC) is to choose the feature. That is the case because Feature Selection (FS) is an effective factor in enhancing the classification accuracy and reduction of calculation time. Hence, in this paper, Shuffled Frog-Leaping Algorithm (SFLA) for FS and ID3 tree for document classification has been used. A problem with SFLA is that it sticks in local optimums; and in the proposed model, a hybrid of the best and the worst situations of the frog is used for enhancement in order to avoid local optimums. The general method in this paper is to enhance SFLA by means of ID3 tree for classification accuracy. The obtained results on Reuters-21578, WebKb, Cade 12, and 20 Newsgroup datasets indicate that the improved proposed model with ID3 tree has a higher accuracy. The results confirm the efficiency of the proposed FS method in improving TDC accuracy

Keywords

Text Document Classification, Feature Selection, Shuffled Frog Leaping Algorithm, ID3 Tree