List of Journals    /    Call For Papers    /    Subscriptions    /    Login
 
 
 
 
 SEARCH
By Author By Title
 
 
ABOUT CSC
 About CSC Journals
 CSC Journals Objectives
 List of Journals
 CALL FOR PAPERS
 Call For Papers CFP
 Special Issue CFP
AUTHOR GUIDELINES
 Submission Guidelines
 Peer Review Process
 Helpful Hints For Getting Published
 Plagiarism Policies
 Abstracting & Indexing
 Open Access Policy
 Submit Manuscript
 FOR REVIEWERS
 Reviewer Guidelines
 FOR EDITORIAL
 Editor Guidelines
 Join Us As Editor
 Launch Special Issue
 Suggest New Journal
 CSC LIBRARY
 Browse CSC Library
 Open Access Policy
  SERVICES
 Conference Partnership Program (CPP)
 Abstracting & Indexing
 SUBSCRIPTIONS
 Subscriptions
 Discounted Packages
 Archival Subscriptions
 How to Subscribe
 Librarians
 Subscriptions Agents
 Order Form
 DOWNLOADS
 
 
 
 
Indexing for Large DNA Database sequences
Full text
 PDF(156.5KB)
Source 
International Journal of Biometrics and Bioinformatics (IJBB)
Table of Contents
Download Complete Issue    PDF(1.69MB)
Volume:  5    Issue:  4
Pages:  NULL
Publication Date:   September / October 2011
ISSN (Online): 1985-2347
Pages 
202 - 215
Author(s)  
Samer Wohoush - Palestinian Occupied Territori
Mahmoud Saheb - Palestinian Occupied Territori
 
Published Date   
05-10-2011 
Publisher 
CSC Journals, Kuala Lumpur, Malaysia
ADDITIONAL INFORMATION
Keywords   Abstract   References   Cited by   Related Articles   Collaborative Colleague
 
KEYWORDS:   Large Database, DNA Sequence, Index Structure, Sequence Transformation, Wavelet Transformation, RDMS Indexing 
 
 
This Manuscript is indexed in the following databases/websites:-
1. Directory of Open Access Journals (DOAJ)
2. Scribd
3. Google Scholar
4. Docstoc
5. Bielefeld Academic Search Engine (BASE)
6. Academic Journals Database
 
 
Bioinformatics data consists of a huge amount of information due to the large number of sequences, the very high sequences lengths and the daily new additions. This data need to be efficiently accessed for many needs. What makes one DNA data item distinct from another is its DNA sequence. DNA sequence consists of a combination of four characters which are A, C, G, T and have different lengths. Use a suitable representation of DNA sequences, and a suitable index structure to hold this representation at main memory will lead to have efficient processing by accessing the DNA sequences through indexing, and will reduce number of disk I/O accesses. I/O operations needed at the end, to avoid false hits, we reduce the number of candidate DNA sequences that need to be checked by pruning, so no need to search the whole database. We need to have a suitable index for searching DNA sequences efficiently, with suitable index size and searching time. The suitable selection of relation fields, where index is build upon has a big effect on index size and search time. Our experiments use the n-gram wavelet transformation upon one field and multi-fields index structure under the relational DBMS environment. Results show the need to consider index size and search time while using indexing carefully. Increasing window size decreases the amount of I/O reference. The use of a single field and multiple fields indexing is highly affected by window size value. Increasing window size value lead to better searching time with special type index using single filed indexing. While the search time is almost good and the same with most index types when using multiple field indexing. Storage space needed for RDMS indexing types are almost the same or greater than the actual data. 
 
 
 
1 Effective Indexing and Filtering for Similarity Search in Large Biosequence Databases. Ozgur Ozturk Hakan Ferhatosmanoglu bibe, pp.359, Third IEEE Symposium on BioInformatics and BioEngineering (BIBE'03), 2003.
2 An efficient similarity search based on indexing in large DNA databases, In-Seon Jeong, Kyoung-Wook Park, Seung-Ho Kang, Hyeong-Seok Lim, 2010.
3 An Efficient Index Structure for String Databases. Tamer Kahveci Ambuj K. Singh Department of Computer Science, University of California Santa Barbara, CA 93106 {amer,ambuj}cs.ucsb.edu, 2001.
4 Fast Dynamic Programming Based Sequence Alignment Algorithm. Nur'Aini Abdul Rashid', Rosni Abdullah, Abdullah Zawawi Haji Talib, Zalila Ali, IEEE, 2006.
5 MAP: Searching Large Genome Databases. T. Kahveci, A. Singh Pacific Symposium on Biocomputing 8:303-314(2003).
6 Indexing and retrieval for genomic database.Hugh E. Williams, Member, IEEE, and Justin Zobel, Member, IEEE Computer Society, IEEE, 2002.
7 S. Muthukrishnan and S. C. Sahinalp. Approximate nearest neighbor and sequence comparison with block operations, 2000.
8 CoMRI: A Compressed Multi-Resolution Index Structure for Sequence Similarity Queries. Hong Sun1, Ozgur Ozturk1, Hakan Ferhatosmano glu, IEEE, 2003.
9 E. Giladi et al., SST: An Algorithm for Finding Near-Exact Sequence Matches in Time Proportional to the Logarithm of the Database Size. Bioinformatics 18, 873–877, 2002.
10 An Efficient Approach for Building Compressed Full-text Index for structured Data: Jun Liang, Lin Xiao, Di Zhang IEEE, 2009.
11 Efficient Maintenance Schema of Inverted Index for Large-scale Full-Text Retrieval, Xiaozhu Liu, State Key Lab of Software Engineering Wuhan University Wuhan 430072, China , School of Automation Wuhan University of Technology IEEE, 2010.
12 Mathematical Extension of Full Text Search Engine, Jozef Misutka, Leo Galambos, Department of Software Engineering, Charles University in Prague, Ke Karlovu 3, 121 16 Prague, Czech Republic, 2008.
13 Experimental Simulation on Incremental Three-gram Index for Two-gram Full-Text Search Systems, Hiroshi Yamamoto Seishiro Ohmi Hiroshi Tsuji IEEE, 2003.
14 A Compact Memory Space of Dynamic Full-Text Search using Bi-Gram Index, El-Sayed Atlam, El-Marhomy Ghada, Masao Fuketa, Kazuhiro Morita and Jun-ichi Aoe, Department of Information Science and Intelligent Systems, University of Tokushima Tokushima,770-8506, Japan 2004.
15 Breaking a Time-and-Space Barrier in Constructing Full-Text Indices, Wing-Kai Hon, Kunihiko Sadakane_ Wing-Kin Sung IEEE, 2003.
16 Parallel Selection Query Processing Involving Index in Parallel Database Systems. J. Wenny Rahayu David Taniar, IEEE, 2002.
17 An Architecture for Parallel Search of Large, Full-text Databases, Nassrin Tavakoli and Hassan Modaress-Razavi, Department of Computer Science, The University of North Carolina at Charlotte, Charlotte, NC 28223 IEEE, 1990.
18 An Ontology Enhanced Development Kit for Full Text Search, Su Jian, Weng Wenyong, Wang Zebing, Lab of Digital City & Electronic Service, Zhejiang University City College, Hangzhou 310015, China IEEE, 2009.
19 Alexander Rubin, Senior Consultant, MySQL AB, Full Text Search in MySQL 5.1 New Features and HowTo, http://www.mysqlfulltextsearch.com/full_text.pdf, 2006.
20 Moshe Shadmon, The ScaleDB Storage Engine, http://www.scaledb.com/pdfs/ScaleDB_MySQL_Preso2009.ppt, 2009.
21 A Hybird Method for Efficient Indexing of XML Documents. Sun Wei, Da-xin Lui, IEEE, 2005.
22 The SBCTree: An Index for RunLength Compressed Sequences, Mohamed Y. Eltabakh ,Wing-Kai Hon, Rahul Shah, Walid G. Aref, Jeffrey S. Vitter Purdue University, 2008, 2008
23 Efficient Filtration of Sequence Similarity Search Through Singular Value Decomposition. S. Alireza Aghili Ozgur D. Sahin Divyakant Agrawal Amr El Abbadi, IEEE 2004.
 
 
 
 
 
 
1 CORE (COnnecting REpositories)
 
2 ebookbrowse
 
 
 
Samer Wohoush : Colleagues
Mahmoud Saheb : Colleagues  
 
 
 
  Untitled Document
 
Copyrights (c) 2012 Computer Science Journals. All rights reserved.
Best viewed at 1152 x 864 resolution. Microsoft Internet Explorer.
 
  
 
Copyrights & Usage: Articles published by CSC Journals are Open Access. Permission to copy and distribute any other content, images, animation and other parts of this website is prohibited. CSC Journals has the rights to take action against individual/group if they are found victim of copying these parts of the website.