Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

(789.91KB)
This is an Open Access publication published under CSC-OpenAccess Policy.
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Mohamed A. Abbas, Amin A. Shoukry
Pages - 28 - 47     |    Revised - 15-03-2012     |    Published - 16-04-2012
Volume - 3   Issue - 2    |    Publication Date - April 2012  Table of Contents
MORE INFORMATION
KEYWORDS
Shared Nearest Neighbors, Mutual Neighbors, Spatial Data, High Dimensional Data, Time Series, Cluster Validation
ABSTRACT
A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary shapes and arbitrary densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are considered as dense regions/ blocks. These blocks are the seeds from which clusters may grow up. Therefore, CSHARP is not a point-to-point clustering algorithm. Rather, it is a block-to-block clustering technique. Much of its advantages come from these facts: Noise points and outliers correspond to blocks of small sizes, and homogeneous blocks highly overlap. This technique is not prone to merge clusters of different densities or different homogeneity. The algorithm has been applied to a variety of low and high dimensional data sets with superior results over existing techniques such as DBScan, K-means, Chameleon, Mitosis and Spectral Clustering. The quality of its results as well as its time complexity, rank it at the front of these techniques.
CITED BY (3)  
1 Drakshayani, B., & Prasad, E. V. (2013). Semantic Based Model for Text Document Clustering with Idioms. Intl. J. Date Engg, 4(1), 1-13.
2 Abbas, M., & Shoukry, A. (2012, July). CMUNE: A clustering using mutual nearest neighbors algorithm. In Information Science, Signal Processing and their Applications (ISSPA), 2012 11th International Conference on (pp. 1192-1197). IEEE.
3 Kuo, C. C., & Shieh, H. L. (2012). A Semi-Supervised Learning Algorithm for Data Classification. International Journal of Pattern Recognition and Artificial Intelligence, 1551007.
1 Google Scholar
2 CiteSeerX
3 refSeek
4 Scribd
5 SlideShare
6 PdfSR
1 Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509–517.
2 Chen, W.-Y., Song, Y., Bai, H., Lin, C.-J., and Chang, E. Y. (2011). Parallel spectral clustering in distributed systems. IEEE Trans. Pattern Anal. Mach. Intell, 33(3):568– 586.
3 Ding, C. H. Q. and He, X. (2004). K-nearest-neighbor consistency in data clustering: incorporating local information into global optimization. In Haddad, H., Omicini, A., Wainwright, R. L., and Liebrock, L. M., editors, SAC, pages 584–589. ACM.
4 Ertoz, Steinbach, and Kumar (2003). Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In SIAM International Conference on Data Mining, volume 3.
5 Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226–231.
6 Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. In ACM SIGMOD. Also published in/as: UCB, Elec.Res.Lab, Res.R. No.M83-64, 1983, with Stonebraker,M.
7 Guyon, I., Gunn, S., Nikravesh, M., and Zadeh, L. (2006). Feature Extraction: Foundations and Applications. Springer Verlag.
8 Halkidi, M., Batistakis, Y., and Vazirgiannis, M. (2002). Cluster validity methods: part I. SIGMOD Record (ACM Special Interest Group on Management of Data), 31(2):40–45.
9 Hammouda, K. M. and Kamel, M. S. (2002). Phrase-based document similarity based on an index graph model. In ICDM, pages 203–210. IEEE Computer Society.
10 Hubert, L. and Arabie, P. (1985). Comparing partitions. Journal of Classification, 2:193– 218.
11 Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs.
12 Jarvis, R. and Patrick, E. (1973). Clustering using a similarity measure based on shared near neighbors. IEEE Transactions on Computers, 22(11):1025–1034.
13 Karypis, G., Han, E.-H. S., and NEWS, V. K. (1999). Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8):68–75.
14 Keogh, E. J., Xi, X., Wei, L., and Ratanamahatana, C. A. The ucr time series classification/clustering, homepage: (www.cs.ucr.edu/ eamonn/time_series_data/), 2006.
15 Lee, J.-S. and Ólafsson, S. (2011). Data clustering by minimizing disconnectivity. Inf. Sci, 181(4):732–746.
16 Murphy, P. M. and Aha, D. W. (1992). UCI repository of machine learning databases. Machine-readable data repository, University of California, Department of Information and Computer Science, Irvine, CA.
17 Nadler, B. and Galun, M. (2006). Fundamental limitations of spectral clustering. In Schölkopf, B., Platt, J. C., and Hoffman, T., editors, NIPS, pages 1017–1024. MIT Press.
18 Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. American Statistical Association Journal, 66(336):846–850.
19 Rijsbergen, C. J. V. (1979). Information Retrieval, 2nd edition. Butterworths, London.
20 von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416.
21 Xu, R. and II, D. C. W. (2005). Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3):645–678.
22 Yousri, N. A., Kamel, M. S., and Ismail, M. A. (2009). A distance-relatedness dynamic model for clustering high dimensional data of arbitrary shapes and densities. Pattern Recognition, 42(7):1193–1209.
23 Ying Zhao and George Karypis. 2001. Criterion functions for document clustering: Experiments and analysis. Technical Report TR 01.40, Department of Computer Science, University of Minnesota
24 Andrew Rosenberg and Julia Hirschberg, 2007. V--Measure: a conditional entropy-- based external cluster evaluation measure. EMNLP '07
25 Mohamed A. Abbas Amin A. Shoukry, and Rasha F. Kashef, ICDM 2012. CSHARP: A Clustering Using Shared Reference Points Algorithm, International Conference on Data Mining, Penang, Malaysia, World Academy of Science, Engineering and Technology.
Mr. Mohamed A. Abbas
- Egypt
mohamed.alyabbas@gmail.com
Professor Amin A. Shoukry
- Egypt