Document Classification, a Novel Neural-based Classifier

Full Text PDF PDF
Author(s) Seyyed Mohammad Reza Farshchi
Pages 89-98
Volume 1
Issue 2
Date November, 2011
Keywords Weil Text Classification (TC), Documents Classification, Information Management, Data Mining

The assignment of natural language texts to one or more predefined categories based on their content – is an important component in many information organization and management tasks. This research proposes a novel approach for documents classification with using novel method that combined competitive self organizing neural text categorizer with new vectors that we called, string vectors. Even if the research on document categorization has been progressed very much, documents should be still encoded into numerical vectors. Such encoding so causes the two main problems: huge dimensionality and sparse distribution. Although many various feature selection methods are developed to address the first problem, but the reduced dimension remains still large. If the dimension is reduced excessively by a feature selection method, robustness of document categorization is degraded. The idea of this research as the solution to the problems is to encode the documents into string vectors and apply it to the novel competitive self organizing neural text categorizer as a string vector. We compare the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, real-time classification speed, and classification accuracy. The quantitative and qualitative experiment results demonstrate that this method can significantly improve the performance of documents classification.  

< Back to Nov Issue