1. In search of the boundary between repetitive and non-repetitive protein sequences
- Author
-
Francois Richard, Andrey V. Kajava, Centre de recherche en Biologie Cellulaire (CRBM), Université Montpellier 2 - Sciences et Techniques (UM2)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Université Montpellier 1 (UM1), Centre de recherches de biochimie macromoléculaire (CRBM), Université Montpellier 1 (UM1)-Université Montpellier 2 - Sciences et Techniques (UM2)-IFR122-Centre National de la Recherche Scientifique (CNRS), Institut de Biologie Computationnelle (IBC), Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), National Research University of Information Technologies, Mechanics and Optics [St. Petersburg] (ITMO), Centre de recherche en Biologie cellulaire de Montpellier (CRBM), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), and Université de Montpellier (UM)-Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Models, Molecular ,Proteomics ,Repetitive Sequences, Amino Acid ,Protein Folding ,endocrine system ,animal structures ,Protein Conformation ,Boundary (topology) ,[SDV.BC]Life Sciences [q-bio]/Cellular Biology ,Computational biology ,Biology ,Biochemistry ,Protein stability ,Protein structure ,Tandem repeat ,Tandem Repeat Sequence ,Animals ,Humans ,ComputingMilieux_MISCELLANEOUS ,Genetics ,Protein Stability ,[SDV.BBM.BP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Biophysics ,Tandem Repeat Sequences ,Aperiodic graph ,Algorithms ,Biomarkers ,hormones, hormone substitutes, and hormone antagonists - Abstract
Tandem repeats (TRs) are frequently not perfect, containing a number of mutations accumulated during evolution. One of the main problems is to distinguish between the sequences that contain highly imperfect TRs and the aperiodic sequences. The majority of proteins with TRs in sequences have repetitive arrangements in their 3D structures. Therefore, the 3D structures of proteins can be used as a benchmarking criterion for TR detection in sequences. Different TR detection tools use their own scoring procedures to determine the boundary between repetitive and non-repetitive protein sequences. Here we described these scoring functions and benchmark them by using known structural TRs. Our survey shows that none of the existing scoring procedures are able to achieve an appropriate separation between genuine structural TRs and non-TR regions. This suggests that if we want to obtain a collection of structurally and functionally meaningful TRs from a large scale analysis of proteomes, the TR scoring metrics need to be improved.
- Published
- 2015