1. Sequence-based protein-protein interaction prediction using greedy layer-wise training of deep neural networks.
- Author
-
Hanggara, Faruq Sandi, Anam, Khairul, Wiyono, Retno Utami Agung, Darmayanti, Rizki Fitria, Setiawan, Felix Arie, and Rohman, Abdur
- Subjects
PROTEIN-protein interactions ,FORECASTING ,AMINO acid sequence ,DISCRETE cosine transforms ,AUTOMATIC speech recognition - Abstract
Jamu is an herbal medicine commonly used before the advent of modern medicine. Generally, the herbal formula is obtained empirically and passed down from generation to generation. However, the healing process with herbs is also influenced by such as myths and local customs. This influence causes differences in the use of herbal ingredients to cure the same disease. The result is a collection of herbal recipes that overlap each other without any supporting evidence of its validity. Protein-protein interaction (PPI) is a biological process that is influenced by drugs in the healing process. Therefore, PPI due to the consumption of herbs can be used as evidence of the effectiveness of herbal medicine. PPI analysis needs to be done to study how proteins interact with other proteins. PPI analysis with an experimental method (wet lab) cannot be carried out on extensive data and only covers a portion of protein interaction networks. Therefore, a computational approach needs to be done. In previous studies, predictions of PPIs were proven to be carried out using only protein sequence information. The advantage of using this protein sequence information is that this method is more universal. Information that can be obtained from protein sequences includes Discrete Cosine Transform, Multi-scale Local Descriptor, Autocovariance, and Conjoint Triad. The study with the sequence information has been done using different machine learning approaches, such as Support Vector Machines, Random Forest, and Probabilistic Neural Networks. A deep learning approach has also been done with Stacked-Autoencoder, which tried to construct a hidden structure of protein sequences. Previously, deep learning has also been proven to be able to handle raw and complex data on a large scale and learn the useful and abstract features of perceptual problems such as image recognition and voice. The method proposed in this study is deep neural networks that were trained using stacked-autoencoder and stacked-randomized autoencoder. The extracted features used are conjoint-triad. This study compares both methods which have different characteristics in the construction of layers in deep neural networks. We conducted experiments with k-Fold cross-validation which became the gold standard for most predictive model testing. Our experiments with 5 cross-validations and 3 hidden layers gave an average validation accuracy of 0.89 ± 0.02 for the SAE method and 0.51 ± 0.003 for the ML-ELM. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF