Postic, Guillaume, Janel, Nathalie, Moroy, Gautier, Unité de Biologie Fonctionnelle et Adaptative (BFA (UMR_8251 / U1133)), Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP), ANR-19-CE18-0023,PIF21,Evaluation de l'utilisation du PIF (pré-implantation factor) comme traitement en période prénatale dans un modèle murin de la trisomie 21(2019), Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Cité (UPCité), HAL UVSQ, Équipe, and Evaluation de l'utilisation du PIF (pré-implantation factor) comme traitement en période prénatale dans un modèle murin de la trisomie 21 - - PIF212019 - ANR-19-CE18-0023 - AAPG2019 - VALID
Graphical abstract, Highlights • We compare ten structural representations, either atomistic or coarse-grained. • Thus, ten distance-dependent statistical potentials of mean force (PMF) were built. • The Cβ-only and Cα + Cβ representations provide the best speed–accuracy trade-off. • Including glycines through Cα, in a Cβ-only representation, yields a higher accuracy. • We generalize the conclusions to the total information gain (TIG) scoring function., The recent breakthrough in the field of protein structure prediction shows the relevance of using knowledge-based based scoring functions in combination with a low-resolution 3D representation of protein macromolecules. The choice of not using all atoms is barely supported by any data in the literature, and is mostly motivated by empirical and practical reasons, such as the computational cost of assessing the numerous folds of the protein conformational space. Here, we present a comprehensive study, carried on a large and balanced benchmark of predicted protein structures, to see how different types of structural representations rank in either accuracy or calculation speed, and which ones offer the best compromise between these two criteria. We tested ten representations, including low-resolution, high-resolution, and coarse-grained approaches. We also investigated the generalization of the findings to other formalisms than the widely-used “potential of mean force” (PMF) method. Thus, we observed that representing protein structures by their β carbons—combined or not with Cα—provides the best speed–accuracy trade-off, when using a “total information gain” scoring function. For statistical PMFs, using MARTINI backbone and side-chains beads is the best option. Finally, we also demonstrated the necessity of training the reference state on all atom types, and of including the Cα atoms of glycine residues, in a Cβ-based representation.