EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

Document Topic Generation in Text Mining by using Cluster Analysis with EROCK

Rizwan Ahmad, Aasia Khanum

Pages - 176 - 182 | Revised - 30-04-2010 | Published - 10-06-2010

Published in International Journal of Computer Science and Security (IJCSS)

Volume - 4 Issue - 2 | Publication Date - May 2010 Table of Contents

MORE INFORMATION

References | Cited By (13) | Abstracting & Indexing

KEYWORDS

Text Mining, Cluster Analysis, Document Similarity

ABSTRACT

Clustering is useful technique in the field of textual data mining. Cluster analysis divides objects into meaningful groups based on similarity between objects. Copious material is available from the World Wide Web (WWW) in response to any user-provided query. It becomes tedious for the user to manually extract real required information from this material. This paper proposes a scheme to effectively address this problem with the help of cluster analysis. In particular, the ROCK algorithm is studied with some modifications. ROCK generates better clusters than other clustering algorithms for data with categorical attributes. We present an enhanced version of ROCK called Enhanced ROCK (EROCK) with improved similarity measure as well as storage efficiency. Evaluation of the proposed algorithm done on standard text documents shows improved performance.

CITED BY (13)

1	Hashimi, H., Hafez, A., & Mathkour, H. (2015). Selection criteria for text mining approaches. Computers in Human Behavior, 51, 729-733.

2	El-Said, A. M., ELDESOKY, A., & Arafat, H. A. (2015). An Efficient Approach to Construct Object Model of Static Textual Structure with Dynamic Behavior Based on Q-learning. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 31(4), 1267-1289.

3	Kadhim, A. I., Cheah, Y. N., & Ahamed, N. H. (2014, December). Text Document Preprocessing and Dimension Reduction Techniques for Text Document Clustering. In Artificial Intelligence with Applications in Engineering and Technology (ICAIET), 2014 4th International Conference on (pp. 69-73). IEEE.

4	Benghabrit, A., Ouhbi, B., Behja, H., & Frikh, B. (2013). Statistical and Semantic Feature Selection for Text Clustering. Journal of Intelligent Computing, 4(2), 69-79.

5	El-Said, A. M., Eldesoky, A. I., & Arafat, H. A. (2013). An efficient object oriented text analysis (OOTA) approach to construct static structure with dynamic behavior. International Journal of Information Acquisition, 9(01), 1350006.

6	Benghabrit, A., Ouhbi, B., Behja, H., & Frikh, B. (2013, June). Text clustering using statistical and semantic data. In Computer and Information Technology (WCCIT), 2013 World Congress on (pp. 1-6). IEEE.

7	Mahalle, M. S. D., & Shah, D. K. Semantic Based Approach for Document Clustering. Journal of Sci., Engg. & Tech. Mgt. Vol 4 (1), MPSTME, Mumbai. July 2012.

8	Khandare, S. S., & Malode, S. N. International Journal of Science Innovations and Discoveries An International peer.

9	Jiang, F. Deriving Topics and Opinions from Microblog.

10	Mahalle, M. S. D., & Shah, K. Document Clustering by using Semantics.

11	Tyagi, A., & Sharma, S. (2012). Implementation Of ROCK Clustering Algorithm For The Optimization Of Query Searching Time. International Journal on Computer Science and Engineering, 4(5), 809.

12	Keole, R. R., & Bamnote, G. R. (2010). Clustering Techniques in Web Content Mining. International Journal of Advanced Research in Computer Science, 1(4).

13	Keole, R. R., & Bamnote, G. R. (2010). Fuzzy Clustering in Web Content Mining. International Journal of Advanced Research in Computer Science, 1(4).

ABSTRACTING & INDEXING

1	Google Scholar

2	Academic Journals Database

3	ScientificCommons

4	Academic Index

5	CiteSeerX

6	refSeek

7	iSEEK

8	Socol@r

9	ResearchGATE

10	Libsearch

11	Bielefeld Academic Search Engine (BASE)

12	Scribd

13	WorldCat

14	SlideShare

15	PDFCAST

16	PdfSR

REFERENCES

Sholom Weiss, Brian White, Chid Apte,” Lightweight Document Clustering”, IBM Research Report RC-21684.

Sudipto Guha, Rajeev Rastogi and Kyuseok Shim, “ROCK: A robust clustering algorithm for categorical attributes”. In: IEEE Internat. Conf. Data Engineering, Sydney, March 1999.

Alain Lelu, Martine Cadot, Pascal Cuxac, “Document stream clustering: experimenting an incremental algorithm and AR-based tools for highlighting dynamic trends.”, International Workshop on Webometrics, Informatics and Scientometrics & Seventh COLIENT Meeting, France, 2006.

Brigitte Mathiak and Silke Eckstein,” Five Steps to Text Mining in Biomedical Literature”, Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics

Huang, Z. (1998). Extensions to the K-means Algorithm for Clustering Large Datasets with Categorical Values. Data Mining and Knowledge Discovery, 2, p. 283-304.

Huidong Jin , Man-Leung Wong , K. -S. Leung, “Scalable Model-Based Clustering for Large Databases Based on Data Summarization”, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.27 n.11, p.1710-1719, November 2005.

Linas Baltruns, Juozas Gordevicius, “Implementation of CURE Clustering Algorithm”, February 1, 2005.

M. Castellano, G. Mastronardi, A. Aprile, and G. Tarricone,”A Web Text Mining Flexible Architecture”, World Academy of Science, Engineering and Technology 32 2007.

Masrah Azrifah Azmi Murad, Trevor Martin,”Similarity-Based Estimation for Document Summarization using Fuzzy Sets”, IJCSS, Volume (1): Issue (4), pp 1-12.

Murtagh, F., “A Survey of Recent Advances in Hierarchical Clustering Algorithms”, The Computer Journal, 1983.

Ng, R.T. and Han, J. 1994. Efficient and effective clustering methods for spatial data mining. Proceedings of the 20th VLDB Conference, Santiago, Chile, pp. 144–155.

S. Salvador, P. Chan, Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms, Proceedings of the 16th IEE International Conference on Tools with AI, 2004, pp. 576–584.

Shaoxu Song and Chunping Li, “Improved ROCK for Text Clustering Using Asymmetric Proximity”, SOFSEM 2006, LNCS 3831, pp. 501–510, 2006.

Sohil Dineshkumar Pandya, Paresh V Virparia, “Testing Various Similarity Metrics and their Permutations with

Stan Salvador and Philip Chan, Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms, Proc. 16th IEEE Intl. Conf. on Tools with AI, pp. 576–584, 2004.

Sudipto Guha, Rajeev Rastogi, Kyuseok Shim, “CURE: An Efficient Clustering Algorithm for Large Databases”.

Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, Angela Y. Wu, “An Efficient k-Means Clustering Algorithm: Analysis and Implementation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 7, July 2002.

Tian Zhang, Raghu Ramakrishan, Miron Livny, “BIRCH: An Efficent Data Clustering Method for Very Large Databases”.

] Jiyeon Choo, Rachsuda Jiamthapthaksin, Chun-sheng Chen, Oner Ulvi Celepcikay, Christian Giusti, and Christoph F. Eick, "MOSAIC: A proximity graph approach for agglomerative clustering," Proceedings 9th International Conference on Data Warehousing and Knowledge Discovery (DaWaK), Regensbug Germany, September 2007.

MANUSCRIPT AUTHORS

Mr. Rizwan Ahmad

- Pakistan

qazirizwan.ahmad@yahoo.com

Mr. Aasia Khanum

- Pakistan

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS