Semantic Based Model for Text Document Clustering with Idioms
B. Drakshayani, E. V. Prasad
Pages - 1 - 13     |    Revised - 15-01-2013     |    Published - 28-02-2013
Volume - 4   Issue - 1    |    Publication Date - January / February 2013  Table of Contents
Document Clustering, Idiom, POS Tagging, Semantic Weight, Semantic Grammar, Hierarchical Clustering Algorithm, Chameleon, Natural Language Processing
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
CITED BY (4)  
1 Brar, S., Mathur, D., Sharma, N., & Phagwara, P. Enhancement in Semantic based Model for Text Document Clustering.
2 Drakshayania, B., & Prasad, E. V. Hybrid Clustering Model for Text Documents with Semantic Based Document Representation.
3 Drakshayani, B., & Prasad, E. V. Metaphor based Document Representation Model for Text Document Clustering. In IEEE Workshop on Computational Intelligence: Theories, Applications and Future Directions (pp. 74-78).
4 Suneetha, s., & rani, m. u. status quo of semantic-based text document clustering: a review.
