Interdisciplinary Bio Central
 
Review (Bioinformatics/Computational biology/Molecular modeling)

Survey on Nucleotide Encoding Techniques and SVM Kernel Design for Human Splice Site Prediction
A.T.M. Golam Bari1, Mst. Rokeya Reaz1, Ho-Jin Choi2 and Byeong-Soo Jeong1,*
1Department of Computer Engineering, Kyung Hee University, Suwon, Korea
2Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
*Corresponding author
  Received : December 12, 2012
  Revised : December 30, 2012
  Accepted : December 31, 2012
  Published : December 31, 2012
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Main text PDF(486.KB)
   (Print version)
Synopsis

Splice site prediction in DNA sequence is a basic search problem for finding exon/intron and intron/exon boundaries. Removing introns and then joining the exons together forms the mRNA sequence. These sequences are the input of the translation process. It is a necessary step in the central dogma of molecular biology. The main task of splice site prediction is to find out the exact GT and AG ended sequences. Then it identifies the true and false GT and AG ended sequences among those candidate sequences. In this paper, we survey research works on splice site prediction based on support vector machine (SVM). The basic difference between these research works is nucleotide encoding technique and SVM kernel selection. Some methods encode the DNA sequence in a sparse way whereas others encode in a probabilistic manner. The encoded sequences serve as input of SVM. The task of SVM is to classify them using its learning model. The accuracy of classification largely depends on the proper kernel selection for sequence data as well as a selection of kernel parameter. We observe each encoding technique and classify them according to their similarity. Then we discuss about kernel and their parameter selection. Our survey paper provides a basic understanding of encoding approaches and proper kernel selection of SVM for splice site prediction.

Keyword: coding sequence, exon-intron boundary, intron-exon boundary, splice site, support vector machine, translation process
IBC   ISSN : 2005-8543   Contact IBC