Interdisciplinary Bio Central
 
Etc. (Bioinformatics/Computational biology/Molecular modeling)

New Design of Neural Network Input and Output Vectors in the Protein Secondary Structure prediction
Byung-chul Lee1 and Dongsup Kim1,*
1Department of Biosystems., Korea Advanced Institute of Science and Technology (KAIST)
*Corresponding author
  Published : October 31, 2006
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Main text PDF(421.KB)
(pre-print version)
Synopsis

The prediction of protein secondary structure has been an important bioinformatics tool that is an essential component of the template-based protein tertiary structure prediction process. The neural network (NN) has been the method of choice for the most prediction methods. In this work, we attempt to improve prediction accuracy by modifying the input feature vectors and the output states of the NN. The first idea is motivated by an observation that the protein's structural information, especially when it is combined with the evolutionary information, significantly improves the accuracy of the predicted tertiary structure. We derive the "potential" parameters for the protein secondary structure by following the procedure similar to the way to derive the directional information tables of GOR method. Those potential parameters are combined with the evolutionary information to construct the feature vectors that are used for training the NN. In addition, we design the new NN output states in which the number of the output states is 27, instead of conventional three state (helix, sheet, and coil) outputs. Each output state represents one of the 27 possible combinations of the three secondary structure types, ranging from helix-helix-helix (HHH) to coil-coil-coil (CCC), that encode the three consecutive secondary structure along the sequence. Predictions are made by combining outputs for neighboring windows. Through this process, our new method achieves the average three-state prediction accuracy (Q3) of over 79% and the segment overlap (SOV) score of over 76%, when tested on the validation set. Moreover, Q3 and SOV score on newly published proteins deposited in LiveBench and Scop are over 78.32% and 75.71%, respectively, outperforming the state-of-the-art PSI-PRED.

Keyword: Neural Network, PSI-PRED, Protein Secondary Structure prediction, Machine learning
IBC   ISSN : 2005-8543   Contact IBC