Interdisciplinary Bio Central
 
Conference paper (Bioinformatics/Computational biology/Molecular modeling)

Global Sequence Homology Detection Using Word Conservation Probability
Jae-Seong Yang1+, Dae-Kyum Kim2+, Jinho Kim and Sanguk Kim1,2,3,*
1School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology, Hyoja-dong, Nam-gu, Pohang, Gyungbuk, Republic of Korea
2Division of Molecular and Life Science, Pohang University of Science and Technology, Hyoja-dong, Nam-gu, Pohang, Gyungbuk, Republic of Korea
3Division of IT Convergence Engineering, Pohang University of Science and Technology, Hyoja-dong, Nam-gu, Pohang, Gyungbuk, Republic of Korea
*Corresponding author
+These authors contributed equally to this work
  Received : October 07, 2011
  Accepted : October 17, 2011
  Published : October 19, 2011
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Main text PDF(1939KB)
   (Print version)
Synopsis

Protein homology detection is an important issue in comparative genomics. Because of the exponential growth of sequence databases, fast and efficient homology detection tools are urgently needed. Currently, for homology detection, sequence comparison methods using local alignment such as BLAST are generally used as they give a reasonable measure for sequence similarity. However, these methods have drawbacks in offering overall sequence similarity, especially in dealing with eukaryotic genomes that often contain many insertions and duplications on sequences. Also these methods do not provide the explicit models for speciation, thus it is difficult to interpret their similarity measure into homology detection. Here, we present a novel method based on Word Conservation Score (WCS) to address the current limitations of homology detection. Instead of counting each amino acid, we adopted the concept of 쁗ord to compare sequences. WCS measures overall sequence similarity by comparing word contents, which is much faster than BLAST comparisons. Furthermore, evolutionary distance between homologous sequences could be measured by WCS. Therefore, we expect that sequence comparison with WCS is useful for the multiple-species-comparisons of large genomes. In the performance comparisons on protein structural classifications, our method showed a considerable improvement over BLAST. Our method found bigger micro-syntenic blocks which consist of orthologs with conserved gene order. By testing on various datasets, we showed that WCS gives faster and better overall similarity measure compared to BLAST.

Keyword: homology detection, alignment-free method, word conservation, global sequence homology, eukaryotic genome
IBC   ISSN : 2005-8543   Contact IBC