EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

Semantic Based Model for Text Document Clustering with Idioms

B. Drakshayani, E. V. Prasad

Pages - 1 - 13 | Revised - 15-01-2013 | Published - 28-02-2013

Published in International Journal of Data Engineering (IJDE)

Volume - 4 Issue - 1 | Publication Date - January / February 2013 Table of Contents

MORE INFORMATION

References | Cited By (4) | Abstracting & Indexing

KEYWORDS

Document Clustering, Idiom, POS Tagging, Semantic Weight, Semantic Grammar, Hierarchical Clustering Algorithm, Chameleon, Natural Language Processing

ABSTRACT

Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.

CITED BY (4)

1	Brar, S., Mathur, D., Sharma, N., & Phagwara, P. Enhancement in Semantic based Model for Text Document Clustering.

2	Drakshayania, B., & Prasad, E. V. Hybrid Clustering Model for Text Documents with Semantic Based Document Representation.

3	Drakshayani, B., & Prasad, E. V. Metaphor based Document Representation Model for Text Document Clustering. In IEEE Workshop on Computational Intelligence: Theories, Applications and Future Directions (pp. 74-78).

4	Suneetha, s., & rani, m. u. status quo of semantic-based text document clustering: a review.

ABSTRACTING & INDEXING

1	Google Scholar

2	CiteSeerX

3	refSeek

4	Scribd

5	SlideShare

6	PdfSR

REFERENCES

A K Jain, "Data clustering : 50 Years Beyond K-Means," in International Conference in Pattern recognition, Pattern Recognition Letters, 31, Issue 8, pp. 651-656, June 2010.

A.Wong, C S Yang G Salton, "A vector space model for Automatic indexing ," Communication ACM, vol. 18, no. 11, pp. 112-117, 1975.

D.Wunsch II, and R.Xu, "Survey of Clustering Algorithms," IEEE Transactions on Neural Networks, vol. 16, No. 3, pp. 46-51, May 2005, DOI:10.1109/TNN.2005.845141.

David Holmes, “Idioms and Expressions “, a method for learning and remembering idioms and expressions.

F.Murtagh "A Survey of Recent Advances in Hierarchical Clustering Algorithms ", in the Computer Journal, vol. 26, no. 4, Jan1983, pp. 354- 359.

G.Karypis, Eui-Hong Han, Vipin Kumar, “Chameleon: Hierarchical Clustering using Dynamic Modeling “, IEEE International Journal of Computer, Aug1999, vol.32, Issue 8, pp.68-75, DOI:10.1109/2.781637.

L.Huang, D.Milne, E.Frank and L.H.Witten, “ Learning a Concept-Based Document Similarity Measure”, Journal of the American Society for Information Science and Technology, 63(8):1593-1608, July 2012,DOI: 10.1002/asi.22689.

M.A.Abbas and A.A.Shoukry, “Clustering Using Shared Reference Points Algorithm Based on a Sound Data Model”, International Journal of Data Engineering(IJDE), Volume 3, Issue 2,2012.

M.F.Porter, “An algorithm for suffix stripping”, Program: electronic library and information systems,Vol. 14 Iss: 3, pp.130 – 137, 1980, DOI:10.1108/eb046814.

POS Tagging-The Stanford Parser, nlp.stanford.edu/software/lex-parser.shtml.

S. Staab, and G.Stumme A.Hotho, "Wordnetimprovetext document clustering," in proceedings of the Semantic web workshop SIGIR, 2003, pp. 541-544.

Supreethi.K.P and E.V.Prasad, "A Novel Document Representation Model for Clustering,"International Journal of Computer Science &Communication, vol. 1, no. 2, pp. 243-245,December 2010.

Supreethi.K.P and E.V.Prasad, “ Web Document Clustering using Case Grammar Structure”,International Conference on Computational Intelligence & Multimedia Applications, vol.2, pp. 98-102, Dec 2007, DOI: 10.1109/ICCIMA.2007.245.

U.S.Tiwari, T.Siddiqui, Natural Language Processing and Information Retrieval., Oxford University Press.

UCIKDD ARCHIVE, kdd.ics.uci.edu.

W.K.God, M.S.Kamel, “PH-SSBM: Phrase Semantic Similarity Based Model for Document Clustering”, IEEE Second International Symposium on Knowledge Acquisition and Modeling, 978-0-7695-3888-4/09, April 2009, DOI: 10.1109/kam.2009.191.

Z.Elberrichi and M.Simonet Abdelmalek Amine, “Evaluation of Text Clustering Methods using WordNet", International Arab Journal of Information Technology, vol. 7, no. 4, Oct 2010.

MANUSCRIPT AUTHORS

Mr. B. Drakshayani

Lecturer in CME Govt. Polytechnic Nalgonda, 508001 - India

draksha_m@yahoo.co.in

Dr. E. V. Prasad

Rector, JNTUK Kakinada - India

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS