Clustering Using Shared Reference Points Algorithm
Based On a Sound Data Model

Mohamed A. Abbas; Amin A. Shoukry

Call for Papers - Ongoing round of submission, notification and publication.

Home | Login or Register | Contact CSC

Home > CSC-OpenAccess Library > Manuscript Information

Full Text Available
(no registration required)

(789.91KB)

-- CSC-OpenAccess Policy

-- Creative Commons Attribution NonCommercial 4.0 International License

>> COMPLETE LIST OF JOURNALS

EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model

Mohamed A. Abbas, Amin A. Shoukry

Pages - 28 - 47 | Revised - 15-03-2012 | Published - 16-04-2012

Published in International Journal of Data Engineering (IJDE)

Volume - 3 Issue - 2 | Publication Date - April 2012 Table of Contents

MORE INFORMATION

References | Cited By (3) | Abstracting & Indexing

KEYWORDS

Shared Nearest Neighbors, Mutual Neighbors, Spatial Data, High Dimensional Data, Time Series, Cluster Validation

ABSTRACT

A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary shapes and arbitrary densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are considered as dense regions/ blocks. These blocks are the seeds from which clusters may grow up. Therefore, CSHARP is not a point-to-point clustering algorithm. Rather, it is a block-to-block clustering technique. Much of its advantages come from these facts: Noise points and outliers correspond to blocks of small sizes, and homogeneous blocks highly overlap. This technique is not prone to merge clusters of different densities or different homogeneity. The algorithm has been applied to a variety of low and high dimensional data sets with superior results over existing techniques such as DBScan, K-means, Chameleon, Mitosis and Spectral Clustering. The quality of its results as well as its time complexity, rank it at the front of these techniques.

CITED BY (3)

1	Drakshayani, B., & Prasad, E. V. (2013). Semantic Based Model for Text Document Clustering with Idioms. Intl. J. Date Engg, 4(1), 1-13.

2	Abbas, M., & Shoukry, A. (2012, July). CMUNE: A clustering using mutual nearest neighbors algorithm. In Information Science, Signal Processing and their Applications (ISSPA), 2012 11th International Conference on (pp. 1192-1197). IEEE.

3	Kuo, C. C., & Shieh, H. L. (2012). A Semi-Supervised Learning Algorithm for Data Classification. International Journal of Pattern Recognition and Artificial Intelligence, 1551007.

ABSTRACTING & INDEXING

1	Google Scholar

2	CiteSeerX

3	refSeek

4	Scribd

5	SlideShare

6	PdfSR

REFERENCES

Andrew Rosenberg and Julia Hirschberg, 2007. V--Measure: a conditional entropy-- based external cluster evaluation measure. EMNLP '07

Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509–517.

Chen, W.-Y., Song, Y., Bai, H., Lin, C.-J., and Chang, E. Y. (2011). Parallel spectral clustering in distributed systems. IEEE Trans. Pattern Anal. Mach. Intell, 33(3):568– 586.

Ding, C. H. Q. and He, X. (2004). K-nearest-neighbor consistency in data clustering: incorporating local information into global optimization. In Haddad, H., Omicini, A., Wainwright, R. L., and Liebrock, L. M., editors, SAC, pages 584–589. ACM.

Ertoz, Steinbach, and Kumar (2003). Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In SIAM International Conference on Data Mining, volume 3.

Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226–231.

Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. In ACM SIGMOD. Also published in/as: UCB, Elec.Res.Lab, Res.R. No.M83-64, 1983, with Stonebraker,M.

Guyon, I., Gunn, S., Nikravesh, M., and Zadeh, L. (2006). Feature Extraction: Foundations and Applications. Springer Verlag.

Halkidi, M., Batistakis, Y., and Vazirgiannis, M. (2002). Cluster validity methods: part I. SIGMOD Record (ACM Special Interest Group on Management of Data), 31(2):40–45.

Hammouda, K. M. and Kamel, M. S. (2002). Phrase-based document similarity based on an index graph model. In ICDM, pages 203–210. IEEE Computer Society.

Hubert, L. and Arabie, P. (1985). Comparing partitions. Journal of Classification, 2:193– 218.

Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs.

Jarvis, R. and Patrick, E. (1973). Clustering using a similarity measure based on shared near neighbors. IEEE Transactions on Computers, 22(11):1025–1034.

Karypis, G., Han, E.-H. S., and NEWS, V. K. (1999). Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8):68–75.

Keogh, E. J., Xi, X., Wei, L., and Ratanamahatana, C. A. The ucr time series classification/clustering, homepage: (www.cs.ucr.edu/ eamonn/time_series_data/), 2006.

Lee, J.-S. and Ólafsson, S. (2011). Data clustering by minimizing disconnectivity. Inf. Sci, 181(4):732–746.

Mohamed A. Abbas Amin A. Shoukry, and Rasha F. Kashef, ICDM 2012. CSHARP: A Clustering Using Shared Reference Points Algorithm, International Conference on Data Mining, Penang, Malaysia, World Academy of Science, Engineering and Technology.

Murphy, P. M. and Aha, D. W. (1992). UCI repository of machine learning databases. Machine-readable data repository, University of California, Department of Information and Computer Science, Irvine, CA.

Nadler, B. and Galun, M. (2006). Fundamental limitations of spectral clustering. In Schölkopf, B., Platt, J. C., and Hoffman, T., editors, NIPS, pages 1017–1024. MIT Press.

Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. American Statistical Association Journal, 66(336):846–850.

Rijsbergen, C. J. V. (1979). Information Retrieval, 2nd edition. Butterworths, London.

von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416.

Xu, R. and II, D. C. W. (2005). Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3):645–678.

Ying Zhao and George Karypis. 2001. Criterion functions for document clustering: Experiments and analysis. Technical Report TR 01.40, Department of Computer Science, University of Minnesota

Yousri, N. A., Kamel, M. S., and Ismail, M. A. (2009). A distance-relatedness dynamic model for clustering high dimensional data of arbitrary shapes and densities. Pattern Recognition, 42(7):1193–1209.

MANUSCRIPT AUTHORS

Mr. Mohamed A. Abbas

- Egypt

mohamed.alyabbas@gmail.com

Professor Amin A. Shoukry

- Egypt

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS