Interdisciplinary Bio Central
Etc. (Bioinformatics/Computational biology/Molecular modeling)

Retrieving Protein Domain Encoding DNA Sequences Automatically Through Database Cross-referencing
Yoon-sup Choi1, Jae-Seong Yang1, Sung Ho Ryu1,2 and Sanguk Kim1,2,3,*
1School of Interdisciplinary Bioscience and Bioengineering,
2Department of Molecular and Life Science,
3Biological Research Information Center, Pohang University of Science and Technology, Pohang, Kyungbuk, 790-784, Republic of Korea
*Corresponding author
  Published : May 31, 2006
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Main text PDF(318.KB)
(pre-print version)

Recent proteomic studies of protein domains require high-throughput and systematic approaches. Since most experiments using protein domains, the modules of protein-protein interactions, require gene cloning, the first experimental step should be retrieving DNA sequences of domain encoding regions from databases. For a large scale proteomic research, however, it is a laborious task to extract a large number of domain sequences manually from several inter-linked databases. We present a new methodology to retrieve DNA sequences of domain encoding regions through automatic database cross-referencing. To extract protein domain encoding regions, it traverses several inter-connected database with validation process. And we applied this method to retrieve all the EGF domain encoding DNA sequences of homo sapiens. This new algorithm was implemented using Python library PAMIE, which enables to cross-reference across distinct databases automatically.

IBC   ISSN : 2005-8543   Contact IBC