7 results on '"Bei, Zhendong"'
Search Results
2. Inter-Residue Distance Prediction From Duet Deep Learning Models.
- Author
-
Zhang, Huiling, Huang, Ying, Bei, Zhendong, Ju, Zhen, Meng, Jintao, Hao, Min, Zhang, Jingjing, Zhang, Haiping, and Xi, Wenhui
- Subjects
DEEP learning ,PROTEIN engineering ,PROTEIN structure ,SEQUENCE alignment ,PROTEIN-protein interactions ,FORECASTING - Abstract
Residue distance prediction from the sequence is critical for many biological applications such as protein structure reconstruction, protein–protein interaction prediction, and protein design. However, prediction of fine-grained distances between residues with long sequence separations still remains challenging. In this study, we propose DuetDis, a method based on duet feature sets and deep residual network with squeeze-and-excitation (SE), for protein inter-residue distance prediction. DuetDis embraces the ability to learn and fuse features directly or indirectly extracted from the whole-genome/metagenomic databases and, therefore, minimize the information loss through ensembling models trained on different feature sets. We evaluate DuetDis and 11 widely used peer methods on a large-scale test set (610 proteins chains). The experimental results suggest that 1) prediction results from different feature sets show obvious differences; 2) ensembling different feature sets can improve the prediction performance; 3) high-quality multiple sequence alignment (MSA) used for both training and testing can greatly improve the prediction performance; and 4) DuetDis is more accurate than peer methods for the overall prediction, more reliable in terms of model prediction score, and more robust against shallow multiple sequence alignment (MSA). [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. OSC: An Online Self-Configuring Big Data Framework for Optimization of QoS.
- Author
-
Bei, Zhendong, Kim, Nam Sung, HWang, Kai, and Yu, Zhibin
- Subjects
- *
GENETIC algorithms , *BUILDING performance - Abstract
Big-data frameworks such as MapReduce/Hadoop or Spark have many performance-critical configuration parameters which may interact with each other in a complex way. Their optimal values for an application on a given cluster are affected by not only the application itself but also its input data. This makes offline auto-configuration approaches hard to be used in practice because the input data of an application may change at each run. To address this issue, we propose an Online Self-Configuring (OSC) approach that automatically determines the optimal parameter values for a given application. OSC synergistically integrates three key techniques. First, OSC leverages ensemble learning to build a precise performance model for a given application. Second, it quantifies the importance of the parameters and interaction intensity between them to accelerate the genetic algorithm for searching optimal configuration parameters. Third, OSC supports an incremental modeling approach to achieve low overhead of the models for online needs. These techniques allow OSC to effectively learn the characteristics of an application and optimize its performance by automatically adjusting the configurations at runtime. Our implementation of OSC atop MapReduce/Hadoop 2.6 improves performance by 60 percent on average and up to 120 percent compared with the state-of-the-art approach. Lastly, the performance benefit of an application running on OSC generally increases along with its input data size. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Evaluation of residue-residue contact prediction methods: From retrospective to prospective.
- Author
-
Zhang, Huiling, Bei, Zhendong, Xi, Wenhui, Hao, Min, Ju, Zhen, Saravanan, Konda Mani, Zhang, Haiping, Guo, Ning, and Wei, Yanjie
- Subjects
- *
AMINO acid sequence , *DEEP learning , *NUCLEAR magnetic resonance , *PROTEIN structure , *TERTIARY structure , *STRUCTURAL bioinformatics - Abstract
Sequence-based residue contact prediction plays a crucial role in protein structure reconstruction. In recent years, the combination of evolutionary coupling analysis (ECA) and deep learning (DL) techniques has made tremendous progress for residue contact prediction, thus a comprehensive assessment of current methods based on a large-scale benchmark data set is very needed. In this study, we evaluate 18 contact predictors on 610 non-redundant proteins and 32 CASP13 targets according to a wide range of perspectives. The results show that different methods have different application scenarios: (1) DL methods based on multi-categories of inputs and large training sets are the best choices for low-contact-density proteins such as the intrinsically disordered ones and proteins with shallow multi-sequence alignments (MSAs). (2) With at least 5L (L is sequence length) effective sequences in the MSA, all the methods show the best performance, and methods that rely only on MSA as input can reach comparable achievements as methods that adopt multi-source inputs. (3) For top L/5 and L/2 predictions, DL methods can predict more hydrophobic interactions while ECA methods predict more salt bridges and disulfide bonds. (4) ECA methods can detect more secondary structure interactions, while DL methods can accurately excavate more contact patterns and prune isolated false positives. In general, multi-input DL methods with large training sets dominate current approaches with the best overall performance. Despite the great success of current DL methods must be stated the fact that there is still much room left for further improvement: (1) With shallow MSAs, the performance will be greatly affected. (2) Current methods show lower precisions for inter-domain compared with intra-domain contact predictions, as well as very high imbalances in precisions for intra-domains. (3) Strong prediction similarities between DL methods indicating more feature types and diversified models need to be developed. (4) The runtime of most methods can be further optimized. Author summary: The amino acid sequence of a protein ultimately determines its tertiary structure, and the tertiary structure determines its function(s) and plays a key role in understanding biological processes and disease pathogenesis. Protein tertiary structure can be determined using experimental techniques such as cryo-electron microscopy, nuclear magnetic resonance and X-ray crystallography, which are very expensive and time-consuming. As an alternative, researchers are trying to use in silico methods to predict the 3D structures. Residue contact-assisted protein folding paves an avenue for sequence-based protein structure prediction and therefore has become one of the most challenging and promising problems in structural bioinformatics. Over the past years, contact prediction has undergone continuous evolution in techniques. Through a retrospective analysis of traditional machine learning /evolutionary coupling analysis methods/ consensus machine learning methods and a multi-perspective study on recently developed deep learning methods, we explore the most advanced contact predictors, pursue application scenarios for different methods, and seek prospective directions for further improvement. We anticipate that our study will serve as a practical and useful guide for the development of future approaches to contact prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
5. Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing.
- Author
-
Yu, Zhibin, Bei, Zhendong, and Qian, Xuehai
- Published
- 2018
- Full Text
- View/download PDF
6. RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Configuration.
- Author
-
Bei, Zhendong, Yu, Zhibin, Zhang, Huiling, Xiong, Wen, Xu, Chengzhong, Eeckhout, Lieven, and Feng, Shengzhong
- Subjects
- *
RANDOM forest algorithms , *ELECTRONIC data processing , *GENETIC algorithms , *COMPUTER programming , *PARAMETERS (Statistics) - Abstract
Hadoop is a widely-used implementation framework of the MapReduce programming model for large-scale data processing. Hadoop performance however is significantly affected by the settings of the Hadoop configuration parameters. Unfortunately, manually tuning these parameters is very time-consuming, if at all practical. This paper proposes an approach, called RFHOC, to automatically tune the Hadoop configuration parameters for optimized performance for a given application running on a given cluster. RFHOC constructs two ensembles of performance models using a random-forest approach for the map and reduce stage respectively. Leveraging these models, RFHOC employs a genetic algorithm to automatically search the Hadoop configuration space. The evaluation of RFHOC using five typical Hadoop programs, each with five different input data sets, shows that it achieves a performance speedup by a factor of 2.11 $\times$
- Published
- 2016
- Full Text
- View/download PDF
7. COMSAT: Residue contact prediction of transmembrane proteins based on support vector machines and mixed integer linear programming.
- Author
-
Zhang, Huiling, Huang, Qingsheng, Bei, Zhendong, Wei, Yanjie, and Floudas, Christodoulos A.
- Abstract
ABSTRACT In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position-specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα-Cα atoms. First, using a rigorous leave-one-protein-out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state-of-the-art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at . Proteins 2016; 84:332-348. © 2016 Wiley Periodicals, Inc. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.