|
| Testing Various Similarity Metrics and their Permutations with Clustering Approach in Context Free Data Cleaning
|
|
Full
text: |
PDF(82.5KB) |
|
|
Source |
International Journal of Computer Science and Security (IJCSS) |
|
Table of Contents |
|
|
Download
Complete Issue PDF(3.22MB) |
|
Volume: 3 Issue: 5 |
| |
Pages: 334-447 |
|
Publication
Date: November 2009 |
|
ISSN
(Online): 1985-1553 |
|
|
|
|
|
Pages |
344 - 350 |
|
Author(s) |
|
|
|
Published
Date |
30-11-2009 |
|
Publisher |
CSC
Journals, Kuala Lumpur,
Malaysia |
|
ADDITIONAL
INFORMATION |
| Keywords Abstract References Cited by Related Articles Collaborative
Colleague |
| |
|
| |
KEYWORDS: Context free data cleaning, Clustering, Sequence similarity metrics |
|
|
| |
|
|
| This Manuscript is indexed in the following databases/websites:- |
|
| 1. Directory of Open Access Journals (DOAJ) |
| 2. OpenJ-Gate |
| 3. Scribd |
| 4. PDFCAST |
| 5. Docstoc |
| 6. Google Scholar |
| 7. CiteSeerX |
| 8. ScientificCommons |
| 9. WorldCat |
| 10. refSeek |
| 11. ResearchGATE |
| 12. Bielefeld Academic Search Engine (BASE) |
| 13. iSEEK |
| 14. Academic Journals Database |
| 15. Libsearch |
| 16. slideshare |
| |
|
| |
|
|
| Organizations can sustain growth in this knowledge era by proficient data analysis, which heavily relies on quality of data. This paper emphasizes on usage of sequence similarity metric with clustering approach in context free data cleaning to improve the quality of data by reducing noise. Authors propose an algorithm to test suitability of value to correct other values of attribute based on distance between them. The sequence similarity metrics like Needlemen-Wunch, Jaro-Winkler, Chapman Ordered Name Similarity and Smith-Waterman are used to find distance of two values. Experimental results show that how the approach can effectively clean the data without reference data. |
| |
|
| |
|
| |
| 1 |
Hui Xiong, Gaurav Pandey, Michael Steinbach, Vipin Kumar. “Enhancing Data Analysis with Noise Removal”. IEEE Transaction on Knowledge & Data Engineering, 18(3):304-319, 2006. |
|
|
| 2 |
Lukasz Ciszak. “Applications of Clustering and Association Methods in Data Cleaning”. In Proceedings of the International Multiconference on Computer Science and Information Technology. 2008. |
|
|
| 3 |
Sohil D Pandya, Dr. Paresh V Virparia. “Data Cleaning in Knowledge Discovery in Databases: Various Approaches”. In Proceedings of the National Seminar on Current Trends in ICT, INDIA, 2009. |
|
|
| 4 |
Sohil D Pandya, Dr. Paresh V Virparia. “Clustering Approach in Context Free Data Cleaning”. National Journal on System & Information Technology, 2(1):83-90, 2009. |
|
|
| 5 |
Sohil D Pandya, Dr. Paresh V Virparia. “Application of Various Permutations of Similarity Metrics with Clustering Approach in Context Free Data Cleaning”. In Proceedings of the National Symposium on Indian IT @ CROXRoads, INDIA, 2009. |
|
|
| 6 |
W Cohen, P Ravikumar, S Fienberg. “A Comparison of String Distance Metrics for Name- Matching Tasks”. In the Proceedings of the IJCAI, 2003. |
|
|
| 7 |
http://en.wikipedia.org/ |
|
|
| 8 |
http://www.dcs.shef.ac.uk/~sam/simmetric.html |
|
|
| |
|
| |
|
| |
| 1 |
S. D. Pandya and P. V. Virparia, “Context Free Data Cleaning and its Application in Mechanism for Suggestive Data Cleaning”, International Journal of Information Science, 1(1), pp. 32-35, 2011. |
|
|
| 2 |
R. Ahmad and A. Khanum, “Document Topic Generation in Text Mining by using Cluster Analysis with EROCK”, International Journal of Computer Science and Security (IJCSS), 4(2), pp. 176 – 182, 2010. |
|
|
| |
|
| |
|
| |
| 1 |
TechRepublic |
| 2 |
Academia.edu |
| 3 |
ZDNet |
| 4 |
4shared |
| 5 |
Scientific & Academic Publishing Co |
| |
|
| |
|
| |
|
| Sohil Dineshkumar Pandya : Colleagues
|
|
| Paresh V Virparia : Colleagues
|
|