EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

Farthest Neighbor Approach for Finding Initial Centroids in K- Means

N.Sandhya, K. Anuradha, V. Sowmya, Ch. Vidyadhari

Pages - 1 - 13 | Revised - 10-08-2014 | Published - 15-09-2014

Published in International Journal of Data Engineering (IJDE)

Volume - 5 Issue - 1 | Publication Date - September 2014 Table of Contents

MORE INFORMATION

References | Abstracting & Indexing

KEYWORDS

Text Clustering, Partitional Approach, Initial Centroids, Similarity Measures, Cluster Accuracy.

ABSTRACT

Text document clustering is gaining popularity in the knowledge discovery field for effectively navigating, browsing and organizing large amounts of textual information into a small number of meaningful clusters. Text mining is a semi-automated process of extracting knowledge from voluminous unstructured data. A widely studied data mining problem in the text domain is clustering. Clustering is an unsupervised learning method that aims to find groups of similar objects in the data with respect to some predefined criterion. In this work we propose a variant method for finding initial centroids. The initial centroids are chosen by using farthest neighbors. For the partitioning based clustering algorithms traditionally the initial centroids are chosen randomly but in the proposed method the initial centroids are chosen by using farthest neighbors. The accuracy of the clusters and efficiency of the partition based clustering algorithms depend on the initial centroids chosen. In the experiment, kmeans algorithm is applied and the initial centroids for kmeans are chosen by using farthest neighbors. Our experimental results shows the accuracy of the clusters and efficiency of the kmeans algorithm is improved compared to the traditional way of choosing initial centroids.

ABSTRACTING & INDEXING

1	Google Scholar

2	CiteSeerX

3	refSeek

4	Scribd

5	SlideShare

6	PdfSR

REFERENCES

A. Ehrenfeucht and D. Haussler. “A new distance metric on strings computable in linear time”. Discrete Applied Math, 1988.

Anderberg, M, “Cluster analysis for applications” ,Academic Press, New York 1973.

Anna Huang, “Similarity Measures for Text Document Clustering”, published in the proceedings of New Zealand Computer Science Research Student Conference 2008.

B. Larson, C. Aone, “Fast and effective text mining using linear-time document clustering”, in:Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 98(463), 1999, pp. 16–22.

Bradley, P. S., Fayyad, “Refining initial points for K-Means clustering”, Proc. 15th International Conf. on Machine Learning, San Francisco, CA, 1998, pp. 91-99.

C.C. Aggarwal, S.G. Gates, P.S. Yu, “On the merits of building categorization systems by supervised clustering”, in: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, pp.352–356.

C.J.Van Rijsbergen,(1989), “Information Retrieval”, Buttersworth, London, Second Edition.

D. Manning, Prabhakar Raghavan, Hinrich Schütze, “An Introduction to Information Retrieval Christopher”, Cambridge University Press, Cambridge, England

D.R. Cutting, D.R. Karger, J.O. Pedersen, and J.W. Tukey, Scatter/Gather: ”A Cluster-based Approach to Browsing Large Document Collections”, SIGIR ‘92, Pages 318 – 329, 1992.

G. Kowalski,”Information Retrieval Systems – Theory and Implementation”, Kluwer Academic Publishers, 1997.

G. Salton, M.J. McGill, “Introduction to Modern Information Retrieval”. McGraw-Hill, 1989.

Harmanpreet singh, Kamaljit Kaur, “New Method for Finding Initial Cluster Centroids in Kmeans Algorithm”,International Journal of Computer Applications (0975 – 8887) Volume 74–No.6, July 2013

K. A. Abdul Nazeer and M. P. Sebastian, “ Improving the accuracy and efficiency of the kmeans clustering algorithm”, Proceedings of the World Congress on Engineering, London,UK, vol. 1, 2009.

Katsavounidis, I., Kuo, C., Zhang, Z., “A new initialization technique for generalized lloyd iteration”, IEEE Signal Processing Letters 1 (10), 1994, pp. 144-146.

Koheri Arai and Ali Ridho Barakbah, “Hierarchical k-means: an algorithm for centroids initialization for k-means”, Reports of The Faculty of Science and Engineering Saga University, vol. 36, No.1, 2007.

M. Rodeh, V. R. Pratt, and S. Even. “Linear algorithm for data compression via string matching”. In Journal of the ACM, pages 28(1):16–24, 1981.

M.F. Porter, “An algorithm for suffix stripping”, Program, vol.14, no.3, pp. 130-137, 1980.

Madhu Yedla, S.R. Pathakota, T.M. Srinivasa, “Enhancing K-means Clustering Algorithm with Improved Initial Centre”, International Journal of Computer Science and Information Technologies, 1 (2) , 2010, pp. 121-125.

O. Zamir, O. Etzioni, O. Madani, R.M. Karp, “Fast and intuitive clustering of web documents”,in: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 1997, pp. 287–290.

O. Zamir, O. Etzioni, O. Madani, R.M. Karp, Fast and Intuitive Clustering of Web Documents,KDD ’97, Pages 287-290, 1997.

Peter Weiner. “Linear pattern matching algorithms”. In SWAT ’73: Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973), pages 1–11,Washington, DC, USA, 1973. IEEE Computer Society.

R. Baeza-Yates, B. Ribeiro-Neto, “Modern Information Retrieval”, Addison-Wesley, 1999.

Salton, G., Wong, A., Yang, C.S. (1975). “A vector space model for automatic indexing”.Communications of the ACM, 18(11):613-620.

Samarjeet Borah, M.K. Ghose, “Performance Analysis of AIM-K-means & K- means in Quality Cluster Generation”, Journal of Computing, vol. 1, Issue 1, December 2009.

Saurabh Sharma, Vishal Gupta. ”Domain Based Punjabi Text Document Clustering”.Proceedings of COLING 2012: Demonstration Papers, pages 393–400,COLING 2012,Mumbai, December 2012.

Tou, J., Gonzales, “Pattern Recognition Principles” ,Addison-Wesley, Reading, MA, 1974.

Ye Yunming, “Advances in knowledge discovery and data mining”, (Springer, 2006).

MANUSCRIPT AUTHORS

Professor N.Sandhya

VNRVJIET - India

sandhyanadela@gmail.com

Professor K. Anuradha

Professor/CSE Gokaraju Rangaraju Institute of Engineering and Technology Hyderabad, 500 090,India - India

Associate Professor V. Sowmya

Associate.Prof/CSE Gokaraju Rangaraju Institute of Engineering and Technology Hyderabad, 500 090,India - India

Associate Professor Ch. Vidyadhari

Asst.Prof/CSE Gokaraju Rangaraju Institute of Engineering and Technology Hyderabad, 500 090,India - India

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS