A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences

B.Umamageswari; B.Karthikeyan; T.Nalini

Call for Papers - Ongoing round of submission, notification and publication.

Home | Login or Register | Contact CSC

Home > CSC-OpenAccess Library > Manuscript Information

Full Text Available
(no registration required)

(553.92KB)

-- CSC-OpenAccess Policy

-- Creative Commons Attribution NonCommercial 4.0 International License

>> COMPLETE LIST OF JOURNALS

EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences

B.Umamageswari, B.Karthikeyan, T.Nalini

Pages - 120 - 127 | Revised - 15-03-2012 | Published - 16-04-2012

Published in International Journal of Computer Science and Security (IJCSS)

Volume - 6 Issue - 2 | Publication Date - April 2012 Table of Contents

MORE INFORMATION

References | Cited By (1) | Abstracting & Indexing

KEYWORDS

Evolutionary Tree, Hierarchical Clustering, Bioinformatics, Codons, mtDNA

ABSTRACT

Large-scale analysis of genome sequences is in progress around the world, the major application of which is to establish the evolutionary relationship among the species using phylogenetic trees. Hierarchical agglomerative algorithms can be used to generate such phylogenetic trees given the distance matrix representing the dissimilarity among the species. ClustalW and Muscle are two general purpose programs that generates distance matrix from the input DNA or protein sequences. The limitation of these programs is that they are based on Smith-Waterman algorithm which uses dynamic programming for doing the pair-wise alignment. This is an extremely time consuming process and the existing systems may even fail to work for larger input data set. To overcome this limitation, we have used the frequency of codons usage as an approximation to find dissimilarity among species. The proposed technique further reduces the complexity by extracting only the significant features of the species from the mtDNA sequences using the techniques like frequent codons, codons with maximum range value or PCA technique. We have observed that the proposed system produces nearly accurate results in a significantly reduced running time.

CITED BY (1)

1	Ahmad, M., Jung, L. T., & Bhuiyan, M. A. A. (2015). On Fuzzy Semantic Similarity Measure for DNA Coding. Computers in Biology and Medicine.

ABSTRACTING & INDEXING

1	Google Scholar

2	CiteSeerX

3	Scribd

4	SlideShare

5	PdfSR

REFERENCES

About mtDNA [online] Available: http://en.wikipedia.org/wiki/Mitochondrial_DNA

B.Umamageswari,T.Nalini,A.R.Arunachalam, ”Clustering DNA sequences by Extracting Pattern Features and using Hierarchical Clustering Algorithm”, presented at the National Conference on Recent Trends in Data Mining and Distributed Systems(NCTD2S), September 2011.

Chellapilla, K. and Fogel, G. B. 1999. “Multiple sequence alignment using Evolutionary Programming”, Proceedings of the 1999 Congress on Evolutionary Computation, Washington D. C.:445-452.

CLUSTALW[online], Available: http://toolkit.tuebingen.mpg.de/.

Elhadi, G.F., Abbas, M.A.,”Clustering DNA sequences by selforganizing map and similarity functions”, In proceedings of the 7th International Conference on Informatics and Systems (INFOS)”, Publication Year: May 2010.

FASTA Format Description [online], NGFN-BLAST by NationaleGenomforschungsnetz. [online] Available: http://ngfnblast.gbf.de/docs/fasta.html

Lindsay I Smith.(2002, February 26). “A tutorial on Principal Component Analysis” [online] Available: www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

Needleman, S. B. and Wunsch, C. D. 1970 “ A general method applicable to the search for similarities in the amino acid sequence of two proteins”, Journal of Molecular Biology, 48: 443- 453.

Roderic, D. M. (1993): Component 2.0 – User Guide, [online] Available: http://taxonomy.zoology.gla.ac.uk/rod/cplite/Manual.html

Smith, Temple F. and Waterman, Michael S. (1981). "Identification of Common Molecular Subsequences".Journal of Molecular Biology147: 195–197.

Sneath & Sokal (1973). “UPGMA (Unweighted Pair Group Method with Arithmetic Mean) Numerical Taxonomy”. W.H. Freeman and Company, San Francisco, pp 230-234

Source of DNA Sequences [online], National Center for Biotechnology Information. Available: http://www.ncbi.nlm.nih.gov/mapview

MANUSCRIPT AUTHORS

Mr. B.Umamageswari

not applicable - India

umamage@gmail.com

Mr. B.Karthikeyan

- Singapore

Mr. T.Nalini

- India

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS