Back to Search Start Over

Synthesizing Aligned Random Pattern Digraphs from protein sequence patterns

Authors :
Annie En-Shiun Lee
Andrew K. C. Wong
Source :
BIBM Workshops
Publication Year :
2011
Publisher :
IEEE, 2011.

Abstract

An essential step of protein function analysis is to discover patterns that represent functional regions in a set of protein family sequences. However, the same functional region of a protein family that occurs in different sequences may contain variations that resulted from biological substitutions, deletions, and insertions. Thus, a sequence pattern representing this functional region seldom repeats precisely at the exact position with the same amino acid residues. To capture these variable associations, we developed a pattern synthesis process. First, we used an effective sequence pattern discovery algorithm to discover high order patterns as input. Next, we group and align these similar discovered patterns into Aligned Random Pattern Clusters (ARPCs). During the clustering process, each ARPC is transformed into a probabilistic structural pattern called the Aligned Random Pattern Digraph (ARPD). The advantages of our synthesis process are 1) the synthesized patterns are not confined to a fixed protein region since the ARPCs captures the similar patterns by their variable sites, 2) the ARPDs retain both horizontal pattern associations and vertical site variations, and 3) the search space for synthesizing input patterns is smaller than that for aligning input sequences. Our method successfully discovers two functional protein regions of the Cytochrome Complex protein family: the proximal and distal binding segment that binds the iron molecule of the heme ligand from each side of the plane without relying on prior knowledge.

Details

Database :
OpenAIRE
Journal :
2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)
Accession number :
edsair.doi...........1092da18eae465c6a1e380a0bc8ac102
Full Text :
https://doi.org/10.1109/bibmw.2011.6112372