658 results on '"Zhu-Hong You"'
Search Results
2. BEROLECMI: a novel prediction method to infer circRNA-miRNA interaction from the role definition of molecular attributes and biological networks
- Author
-
Xin-Fei Wang, Chang-Qing Yu, Zhu-Hong You, Yan Wang, Lan Huang, Yan Qiao, Lei Wang, and Zheng-Wei Li
- Subjects
Competing endogenous RNA ,circRNA–miRNA interaction ,Association prediction ,Network embedding ,Biomarker discovery ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Circular RNA (CircRNA)–microRNA (miRNA) interaction (CMI) is an important model for the regulation of biological processes by non-coding RNA (ncRNA), which provides a new perspective for the study of human complex diseases. However, the existing CMI prediction models mainly rely on the nearest neighbor structure in the biological network, ignoring the molecular network topology, so it is difficult to improve the prediction performance. In this paper, we proposed a new CMI prediction method, BEROLECMI, which uses molecular sequence attributes, molecular self-similarity, and biological network topology to define the specific role feature representation for molecules to infer the new CMI. BEROLECMI effectively makes up for the lack of network topology in the CMI prediction model and achieves the highest prediction performance in three commonly used data sets. In the case study, 14 of the 15 pairs of unknown CMIs were correctly predicted.
- Published
- 2024
- Full Text
- View/download PDF
3. MHESMMR: a multilevel model for predicting the regulation of miRNAs expression by small molecules
- Author
-
Yong-Jian Guan, Chang-Qing Yu, Li-Ping Li, Zhu-Hong You, Meng-meng Wei, Xin-Fei Wang, Chen Yang, and Lu-Xiang Guo
- Subjects
LINE ,microRNA ,Small molecule ,Generally attributed multiplex heterogeneous network embedding ,Machine learning ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract According to the expression of miRNA in pathological processes, miRNAs can be divided into oncogenes or tumor suppressors. Prediction of the regulation relations between miRNAs and small molecules (SMs) becomes a vital goal for miRNA-target therapy. But traditional biological approaches are laborious and expensive. Thus, there is an urgent need to develop a computational model. In this study, we proposed a computational model to predict whether the regulatory relationship between miRNAs and SMs is up-regulated or down-regulated. Specifically, we first use the Large-scale Information Network Embedding (LINE) algorithm to construct the node features from the self-similarity networks, then use the General Attributed Multiplex Heterogeneous Network Embedding (GATNE) algorithm to extract the topological information from the attribute network, and finally utilize the Light Gradient Boosting Machine (LightGBM) algorithm to predict the regulatory relationship between miRNAs and SMs. In the fivefold cross-validation experiment, the average accuracies of the proposed model on the SM2miR dataset reached 79.59% and 80.37% for up-regulation pairs and down-regulation pairs, respectively. In addition, we compared our model with another published model. Moreover, in the case study for 5-FU, 7 of 10 candidate miRNAs are confirmed by related literature. Therefore, we believe that our model can promote the research of miRNA-targeted therapy.
- Published
- 2024
- Full Text
- View/download PDF
4. scPML: pathway-based multi-view learning for cell type annotation from single-cell RNA-seq data
- Author
-
Zhi-Hua Du, Wei-Lin Hu, Jian-Qiang Li, Xuequn Shang, Zhu-Hong You, Zhuang-zhuang Chen, and Yu-An Huang
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Recent developments in single-cell technology have enabled the exploration of cellular heterogeneity at an unprecedented level, providing invaluable insights into various fields, including medicine and disease research. Cell type annotation is an essential step in its omics research. The mainstream approach is to utilize well-annotated single-cell data to supervised learning for cell type annotation of new singlecell data. However, existing methods lack good generalization and robustness in cell annotation tasks, partially due to difficulties in dealing with technical differences between datasets, as well as not considering the heterogeneous associations of genes in regulatory mechanism levels. Here, we propose the scPML model, which utilizes various gene signaling pathway data to partition the genetic features of cells, thus characterizing different interaction maps between cells. Extensive experiments demonstrate that scPML performs better in cell type annotation and detection of unknown cell types from different species, platforms, and tissues.
- Published
- 2023
- Full Text
- View/download PDF
5. GraphCPIs: A novel graph-based computational model for potential compound-protein interactions
- Author
-
Zhan-Heng Chen, Bo-Wei Zhao, Jian-Qiang Li, Zhen-Hao Guo, and Zhu-Hong You
- Subjects
MT: Bioinformatics ,graph representation ,graph convolutional network ,computational methods ,network embedding ,XGBoost ,Therapeutics. Pharmacology ,RM1-950 - Abstract
Identifying proteins that interact with drug compounds has been recognized as an important part in the process of drug discovery. Despite extensive efforts that have been invested in predicting compound-protein interactions (CPIs), existing traditional methods still face several challenges. The computer-aided methods can identify high-quality CPI candidates instantaneously. In this research, a novel model is named GraphCPIs, proposed to improve the CPI prediction accuracy. First, we establish the adjacent matrix of entities connected to both drugs and proteins from the collected dataset. Then, the feature representation of nodes could be obtained by using the graph convolutional network and Grarep embedding model. Finally, an extreme gradient boosting (XGBoost) classifier is exploited to identify potential CPIs based on the stacked two kinds of features. The results demonstrate that GraphCPIs achieves the best performance, whose average predictive accuracy rate reaches 90.09%, average area under the receiver operating characteristic curve is 0.9572, and the average area under the precision and recall curve is 0.9621. Moreover, comparative experiments reveal that our method surpasses the state-of-the-art approaches in the field of accuracy and other indicators with the same experimental environment. We believe that the GraphCPIs model will provide valuable insight to discover novel candidate drug-related proteins.
- Published
- 2023
- Full Text
- View/download PDF
6. GKLOMLI: a link prediction model for inferring miRNA–lncRNA interactions by using Gaussian kernel-based method on network profile and linear optimization algorithm
- Author
-
Leon Wong, Lei Wang, Zhu-Hong You, Chang-An Yuan, Yu-An Huang, and Mei-Yuan Cao
- Subjects
Computational biology ,miRNA–lncRNA interaction ,Link prediction ,Competing endogenous RNA (ceRNA) ,Gaussian kernel ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background The limited knowledge of miRNA–lncRNA interactions is considered as an obstruction of revealing the regulatory mechanism. Accumulating evidence on Human diseases indicates that the modulation of gene expression has a great relationship with the interactions between miRNAs and lncRNAs. However, such interaction validation via crosslinking-immunoprecipitation and high-throughput sequencing (CLIP-seq) experiments that inevitably costs too much money and time but with unsatisfactory results. Therefore, more and more computational prediction tools have been developed to offer many reliable candidates for a better design of further bio-experiments. Methods In this work, we proposed a novel link prediction model based on Gaussian kernel-based method and linear optimization algorithm for inferring miRNA–lncRNA interactions (GKLOMLI). Given an observed miRNA–lncRNA interaction network, the Gaussian kernel-based method was employed to output two similarity matrixes of miRNAs and lncRNAs. Based on the integrated matrix combined with similarity matrixes and the observed interaction network, a linear optimization-based link prediction model was trained for inferring miRNA–lncRNA interactions. Results To evaluate the performance of our proposed method, k-fold cross-validation (CV) and leave-one-out CV were implemented, in which each CV experiment was carried out 100 times on a training set generated randomly. The high area under the curves (AUCs) at 0.8623 ± 0.0027 (2-fold CV), 0.9053 ± 0.0017 (5-fold CV), 0.9151 ± 0.0013 (10-fold CV), and 0.9236 (LOO-CV), illustrated the precision and reliability of our proposed method. Conclusion GKLOMLI with high performance is anticipated to be used to reveal underlying interactions between miRNA and their target lncRNAs, and deciphers the potential mechanisms of the complex diseases.
- Published
- 2023
- Full Text
- View/download PDF
7. DeepMPF: deep learning framework for predicting drug–target interactions based on multi-modal representation with meta-path semantic analysis
- Author
-
Zhong-Hao Ren, Zhu-Hong You, Quan Zou, Chang-Qing Yu, Yan-Fang Ma, Yong-Jian Guan, Hai-Ru You, Xin-Fei Wang, and Jie Pan
- Subjects
Drug–protein interactions ,Multi-modal ,Meta-path ,Sequence analysis ,Joint learning ,Natural language processing ,Medicine - Abstract
Abstract Background Drug-target interaction (DTI) prediction has become a crucial prerequisite in drug design and drug discovery. However, the traditional biological experiment is time-consuming and expensive, as there are abundant complex interactions present in the large size of genomic and chemical spaces. For alleviating this phenomenon, plenty of computational methods are conducted to effectively complement biological experiments and narrow the search spaces into a preferred candidate domain. Whereas, most of the previous approaches cannot fully consider association behavior semantic information based on several schemas to represent complex the structure of heterogeneous biological networks. Additionally, the prediction of DTI based on single modalities cannot satisfy the demand for prediction accuracy. Methods We propose a multi-modal representation framework of ‘DeepMPF’ based on meta-path semantic analysis, which effectively utilizes heterogeneous information to predict DTI. Specifically, we first construct protein–drug-disease heterogeneous networks composed of three entities. Then the feature information is obtained under three views, containing sequence modality, heterogeneous structure modality and similarity modality. We proposed six representative schemas of meta-path to preserve the high-order nonlinear structure and catch hidden structural information of the heterogeneous network. Finally, DeepMPF generates highly representative comprehensive feature descriptors and calculates the probability of interaction through joint learning. Results To evaluate the predictive performance of DeepMPF, comparison experiments are conducted on four gold datasets. Our method can obtain competitive performance in all datasets. We also explore the influence of the different feature embedding dimensions, learning strategies and classification methods. Meaningfully, the drug repositioning experiments on COVID-19 and HIV demonstrate DeepMPF can be applied to solve problems in reality and help drug discovery. The further analysis of molecular docking experiments enhances the credibility of the drug candidates predicted by DeepMPF. Conclusions All the results demonstrate the effectively predictive capability of DeepMPF for drug-target interactions. It can be utilized as a useful tool to prescreen the most potential drug candidates for the protein. The web server of the DeepMPF predictor is freely available at http://120.77.11.78/DeepMPF/ , which can help relevant researchers to further study.
- Published
- 2023
- Full Text
- View/download PDF
8. A weighted non-negative matrix factorization approach to predict potential associations between drug and disease
- Author
-
Mei-Neng Wang, Xue-Jun Xie, Zhu-Hong You, De-Wu Ding, and Leon Wong
- Subjects
Drug-disease association ,Weighted nearest neighbor ,Graph regularization ,Non-negative matrix factorization ,Medicine - Abstract
Abstract Background Associations of drugs with diseases provide important information for expediting drug development. Due to the number of known drug-disease associations is still insufficient, and considering that inferring associations between them through traditional in vitro experiments is time-consuming and costly. Therefore, more accurate and reliable computational methods urgent need to be developed to predict potential associations of drugs with diseases. Methods In this study, we present the model called weighted graph regularized collaborative non-negative matrix factorization for drug-disease association prediction (WNMFDDA). More specifically, we first calculated the drug similarity and disease similarity based on the chemical structures of drugs and medical description information of diseases, respectively. Then, to extend the model to work for new drugs and diseases, weighted $$K$$ K nearest neighbor was used as a preprocessing step to reconstruct the interaction score profiles of drugs with diseases. Finally, a graph regularized non-negative matrix factorization model was used to identify potential associations between drug and disease. Results During the cross-validation process, WNMFDDA achieved the AUC values of 0.939 and 0.952 on Fdataset and Cdataset under ten-fold cross validation, respectively, which outperforms other competing prediction methods. Moreover, case studies for several drugs and diseases were carried out to further verify the predictive performance of WNMFDDA. As a result, 13(Doxorubicin), 13(Amiodarone), 12(Obesity) and 12(Asthma) of the top 15 corresponding candidate diseases or drugs were confirmed by existing databases. Conclusions The experimental results adequately demonstrated that WNMFDDA is a very effective method for drug-disease association prediction. We believe that WNMFDDA is helpful for relevant biomedical researchers in follow-up studies.
- Published
- 2022
- Full Text
- View/download PDF
9. Robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier
- Author
-
Yang Li, Xue-Gang Hu, Zhu-Hong You, Li-Ping Li, Pei-Pei Li, Yan-Bin Wang, and Yu-An Huang
- Subjects
Self-interacting proteins ,Protein sequence ,Gray level co-occurrence matrix ,Sparse representation ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Self-interacting proteins (SIPs), two or more copies of the protein that can interact with each other expressed by one gene, play a central role in the regulation of most living cells and cellular functions. Although numerous SIPs data can be provided by using high-throughput experimental techniques, there are still several shortcomings such as in time-consuming, costly, inefficient, and inherently high in false-positive rates, for the experimental identification of SIPs even nowadays. Therefore, it is more and more significant how to develop efficient and accurate automatic approaches as a supplement of experimental methods for assisting and accelerating the study of predicting SIPs from protein sequence information. Results In this paper, we present a novel framework, termed GLCM-WSRC (gray level co-occurrence matrix-weighted sparse representation based classification), for predicting SIPs automatically based on protein evolutionary information from protein primary sequences. More specifically, we firstly convert the protein sequence into Position Specific Scoring Matrix (PSSM) containing protein sequence evolutionary information, exploiting the Position Specific Iterated BLAST (PSI-BLAST) tool. Secondly, using an efficient feature extraction approach, i.e., GLCM, we extract abstract salient and invariant feature vectors from the PSSM, and then perform a pre-processing operation, the adaptive synthetic (ADASYN) technique, to balance the SIPs dataset to generate new feature vectors for classification. Finally, we employ an efficient and reliable WSRC model to identify SIPs according to the known information of self-interacting and non-interacting proteins. Conclusions Extensive experimental results show that the proposed approach exhibits high prediction performance with 98.10% accuracy on the yeast dataset, and 91.51% accuracy on the human dataset, which further reveals that the proposed model could be a useful tool for large-scale self-interacting protein prediction and other bioinformatics tasks detection in the future.
- Published
- 2022
- Full Text
- View/download PDF
10. KS-CMI: A circRNA-miRNA interaction prediction method based on the signed graph neural network and denoising autoencoder
- Author
-
Xin-Fei Wang, Chang-Qing Yu, Zhu-Hong You, Yan Qiao, Zheng-Wei Li, Wen-Zhun Huang, Ji-Ren Zhou, and Hai-Yan Jin
- Subjects
Gene network ,Neural networks ,Science - Abstract
Summary: Circular RNA (circRNA) plays an important role in the diagnosis, treatment, and prognosis of human diseases. The discovery of potential circRNA-miRNA interactions (CMI) is of guiding significance for subsequent biological experiments. Limited by the small amount of experimentally supported data and high randomness, existing models are difficult to accomplish the CMI prediction task based on real cases. In this paper, we propose KS-CMI, a novel method for effectively accomplishing CMI prediction in real cases. KS-CMI enriches the ‘behavior relationships’ of molecules by constructing circRNA-miRNA-cancer (CMCI) networks and extracts the behavior relationship attribute of molecules based on balance theory. Next, the denoising autoencoder (DAE) is used to enhance the feature representation of molecules. Finally, the CatBoost classifier was used for prediction. KS-CMI achieved the most reliable prediction results in real cases and achieved competitive performance in all datasets in the CMI prediction.
- Published
- 2023
- Full Text
- View/download PDF
11. Knowledge graph embedding for profiling the interaction between transcription factors and their target genes.
- Author
-
Yang-Han Wu, Yu-An Huang, Jian-Qiang Li, Zhu-Hong You, Peng-Wei Hu, Lun Hu, Victor C M Leung, and Zhi-Hua Du
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Interactions between transcription factor and target gene form the main part of gene regulation network in human, which are still complicating factors in biological research. Specifically, for nearly half of those interactions recorded in established database, their interaction types are yet to be confirmed. Although several computational methods exist to predict gene interactions and their type, there is still no method available to predict them solely based on topology information. To this end, we proposed here a graph-based prediction model called KGE-TGI and trained in a multi-task learning manner on a knowledge graph that we specially constructed for this problem. The KGE-TGI model relies on topology information rather than being driven by gene expression data. In this paper, we formulate the task of predicting interaction types of transcript factor and target genes as a multi-label classification problem for link types on a heterogeneous graph, coupled with solving another link prediction problem that is inherently related. We constructed a ground truth dataset as benchmark and evaluated the proposed method on it. As a result of the 5-fold cross experiments, the proposed method achieved average AUC values of 0.9654 and 0.9339 in the tasks of link prediction and link type classification, respectively. In addition, the results of a series of comparison experiments also prove that the introduction of knowledge information significantly benefits to the prediction and that our methodology achieve state-of-the-art performance in this problem.
- Published
- 2023
- Full Text
- View/download PDF
12. Multi-view heterogeneous molecular network representation learning for protein–protein interaction prediction
- Author
-
Xiao-Rui Su, Lun Hu, Zhu-Hong You, Peng-Wei Hu, and Bo-Wei Zhao
- Subjects
Protein–protein interaction ,Protein sequence ,LINE ,Network representation learning ,Heterogeneous molecular network ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Protein–protein interaction (PPI) plays an important role in regulating cells and signals. Despite the ongoing efforts of the bioassay group, continued incomplete data limits our ability to understand the molecular roots of human disease. Therefore, it is urgent to develop a computational method to predict PPIs from the perspective of molecular system. Methods In this paper, a highly efficient computational model, MTV-PPI, is proposed for PPI prediction based on a heterogeneous molecular network by learning inter-view protein sequences and intra-view interactions between molecules simultaneously. On the one hand, the inter-view feature is extracted from the protein sequence by k-mer method. On the other hand, we use a popular embedding method LINE to encode the heterogeneous molecular network to obtain the intra-view feature. Thus, the protein representation used in MTV-PPI is constructed by the aggregation of its inter-view feature and intra-view feature. Finally, random forest is integrated to predict potential PPIs. Results To prove the effectiveness of MTV-PPI, we conduct extensive experiments on a collected heterogeneous molecular network with the accuracy of 86.55%, sensitivity of 82.49%, precision of 89.79%, AUC of 0.9301 and AUPR of 0.9308. Further comparison experiments are performed with various protein representations and classifiers to indicate the effectiveness of MTV-PPI in predicting PPIs based on a complex network. Conclusion The achieved experimental results illustrate that MTV-PPI is a promising tool for PPI prediction, which may provide a new perspective for the future interactions prediction researches based on heterogeneous molecular network.
- Published
- 2022
- Full Text
- View/download PDF
13. GBDR: a Bayesian model for precise prediction of pathogenic microorganisms using 16S rRNA gene sequences
- Author
-
Yu-An Huang, Zhi-An Huang, Jian-Qiang Li, Zhu-Hong You, Lei Wang, Hai-Cheng Yi, and Chang-Qing Yu
- Subjects
Pathogenic microorganisms ,Computational prediction model ,16S rRNA sequence analysis ,Microbe-disease association network ,Bayesian ranking ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Recent evidences have suggested that human microorganisms participate in important biological activities in the human body. The dysfunction of host-microbiota interactions could lead to complex human disorders. The knowledge on host-microbiota interactions can provide valuable insights into understanding the pathological mechanism of diseases. However, it is time-consuming and costly to identify the disorder-specific microbes from the biological “haystack” merely by routine wet-lab experiments. With the developments in next-generation sequencing and omics-based trials, it is imperative to develop computational prediction models for predicting microbe-disease associations on a large scale. Results Based on the known microbe-disease associations derived from the Human Microbe-Disease Association Database (HMDAD), the proposed model shows reliable performance with high values of the area under ROC curve (AUC) of 0.9456 and 0.8866 in leave-one-out cross validations and five-fold cross validations, respectively. In case studies of colorectal carcinoma, 80% out of the top-20 predicted microbes have been experimentally confirmed via published literatures. Conclusion Based on the assumption that functionally similar microbes tend to share the similar interaction patterns with human diseases, we here propose a group based computational model of Bayesian disease-oriented ranking to prioritize the most potential microbes associating with various human diseases. Based on the sequence information of genes, two computational approaches (BLAST+ and MEGA 7) are leveraged to measure the microbe-microbe similarity from different perspectives. The disease-disease similarity is calculated by capturing the hierarchy information from the Medical Subject Headings (MeSH) data. The experimental results illustrate the accuracy and effectiveness of the proposed model. This work is expected to facilitate the characterization and identification of promising microbial biomarkers.
- Published
- 2022
- Full Text
- View/download PDF
14. A learning-based method to predict LncRNA-disease associations by combining CNN and ELM
- Author
-
Zhen-Hao Guo, Zhan-Heng Chen, Zhu-Hong You, Yan-Bin Wang, Hai-Cheng Yi, and Mei-Neng Wang
- Subjects
CNN ,ELM ,lncRNA ,Disease ,Association prediction ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background lncRNAs play a critical role in numerous biological processes and life activities, especially diseases. Considering that traditional wet experiments for identifying uncovered lncRNA-disease associations is limited in terms of time consumption and labor cost. It is imperative to construct reliable and efficient computational models as addition for practice. Deep learning technologies have been proved to make impressive contributions in many areas, but the feasibility of it in bioinformatics has not been adequately verified. Results In this paper, a machine learning-based model called LDACE was proposed to predict potential lncRNA-disease associations by combining Extreme Learning Machine (ELM) and Convolutional Neural Network (CNN). Specifically, the representation vectors are constructed by integrating multiple types of biology information including functional similarity and semantic similarity. Then, CNN is applied to mine both local and global features. Finally, ELM is chosen to carry out the prediction task to detect the potential lncRNA-disease associations. The proposed method achieved remarkable Area Under Receiver Operating Characteristic Curve of 0.9086 in Leave-one-out cross-validation and 0.8994 in fivefold cross-validation, respectively. In addition, 2 kinds of case studies based on lung cancer and endometrial cancer indicate the robustness and efficiency of LDACE even in a real environment. Conclusions Substantial results demonstrated that the proposed model is expected to be an auxiliary tool to guide and assist biomedical research, and the close integration of deep learning and biology big data will provide life sciences with novel insights.
- Published
- 2022
- Full Text
- View/download PDF
15. LPIH2V: LncRNA-protein interactions prediction using HIN2Vec based on heterogeneous networks model
- Author
-
Meng-Meng Wei, Chang-Qing Yu, Li-Ping Li, Zhu-Hong You, Zhong-Hao Ren, Yong-Jian Guan, Xin-Fei Wang, and Yue-Chao Li
- Subjects
lncRNA-protein interaction ,heterogeneous information network ,network embedding ,HIN2Vec ,behavioral features ,Genetics ,QH426-470 - Abstract
LncRNA-protein interaction plays an important role in the development and treatment of many human diseases. As the experimental approaches to determine lncRNA–protein interactions are expensive and time-consuming, considering that there are few calculation methods, therefore, it is urgent to develop efficient and accurate methods to predict lncRNA-protein interactions. In this work, a model for heterogeneous network embedding based on meta-path, namely LPIH2V, is proposed. The heterogeneous network is composed of lncRNA similarity networks, protein similarity networks, and known lncRNA-protein interaction networks. The behavioral features are extracted in a heterogeneous network using the HIN2Vec method of network embedding. The results showed that LPIH2V obtains an AUC of 0.97 and ACC of 0.95 in the 5-fold cross-validation test. The model successfully showed superiority and good generalization ability. Compared to other models, LPIH2V not only extracts attribute characteristics by similarity, but also acquires behavior properties by meta-path wandering in heterogeneous networks. LPIH2V would be beneficial in forecasting interactions between lncRNA and protein.
- Published
- 2023
- Full Text
- View/download PDF
16. Learning from low-rank multimodal representations for predicting disease-drug associations
- Author
-
Pengwei Hu, Yu-an Huang, Jing Mei, Henry Leung, Zhan-heng Chen, Ze-min Kuang, Zhu-hong You, and Lun Hu
- Subjects
Disease-drug associations prediction ,Low-rank tensors ,Multimodal fusion ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Abstract Background Disease-drug associations provide essential information for drug discovery and disease treatment. Many disease-drug associations remain unobserved or unknown, and trials to confirm these associations are time-consuming and expensive. To better understand and explore these valuable associations, it would be useful to develop computational methods for predicting unobserved disease-drug associations. With the advent of various datasets describing diseases and drugs, it has become more feasible to build a model describing the potential correlation between disease and drugs. Results In this work, we propose a new prediction method, called LMFDA, which works in several stages. First, it studies the drug chemical structure, disease MeSH descriptors, disease-related phenotypic terms, and drug-drug interactions. On this basis, similarity networks of different sources are constructed to enrich the representation of drugs and diseases. Based on the fused disease similarity network and drug similarity network, LMFDA calculated the association score of each pair of diseases and drugs in the database. This method achieves good performance on Fdataset and Cdataset, AUROCs were 91.6% and 92.1% respectively, higher than many of the existing computational models. Conclusions The novelty of LMFDA lies in the introduction of multimodal fusion using low-rank tensors to fuse multiple similar networks and combine matrix complement technology to predict potential association. We have demonstrated that LMFDA can display excellent network integration ability for accurate disease-drug association inferring and achieve substantial improvement over the advanced approach. Overall, experimental results on two real-world networks dataset demonstrate that LMFDA able to delivers an excellent detecting performance. Results also suggest that perfecting similar networks with as much domain knowledge as possible is a promising direction for drug repositioning.
- Published
- 2021
- Full Text
- View/download PDF
17. Efficient framework for predicting MiRNA-disease associations based on improved hybrid collaborative filtering
- Author
-
Ru Nie, Zhengwei Li, Zhu-hong You, Wenzheng Bao, and Jiashu Li
- Subjects
miRNA-disease association prediction ,Hybrid collaborative filtering ,Heterogeneous data ,Singular value decomposition ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Abstract Background Accumulating studies indicates that microRNAs (miRNAs) play vital roles in the process of development and progression of many human complex diseases. However, traditional biochemical experimental methods for identifying disease-related miRNAs cost large amount of time, manpower, material and financial resources. Methods In this study, we developed a framework named hybrid collaborative filtering for miRNA-disease association prediction (HCFMDA) by integrating heterogeneous data, e.g., miRNA functional similarity, disease semantic similarity, known miRNA-disease association networks, and Gaussian kernel similarity of miRNAs and diseases. To capture the intrinsic interaction patterns embedded in the sparse association matrix, we prioritized the predictive score by fusing three types of information: similar disease associations, similar miRNA associations, and similar disease-miRNA associations. Meanwhile, singular value decomposition was adopted to reduce the impact of noise and accelerate predictive speed. Results We then validated HCFMDA with leave-one-out cross-validation (LOOCV) and two types of case studies. In the LOOCV, we achieved 0.8379 of AUC (area under the curve). To evaluate the performance of HCFMDA on real diseases, we further implemented the first type of case validation over three important human diseases: Colon Neoplasms, Esophageal Neoplasms and Prostate Neoplasms. As a result, 44, 46 and 44 out of the top 50 predicted disease-related miRNAs were confirmed by experimental evidence. Moreover, the second type of case validation on Breast Neoplasms indicates that HCFMDA could also be applied to predict potential miRNAs towards those diseases without any known associated miRNA. Conclusions The satisfactory prediction performance demonstrates that our model could serve as a reliable tool to guide the following research for identifying candidate miRNAs associated with human diseases.
- Published
- 2021
- Full Text
- View/download PDF
18. Robust and accurate prediction of protein–protein interactions by exploiting evolutionary information
- Author
-
Yang Li, Zheng Wang, Li-Ping Li, Zhu-Hong You, Wen-Zhun Huang, Xin-Ke Zhan, and Yan-Bin Wang
- Subjects
Medicine ,Science - Abstract
Abstract Various biochemical functions of organisms are performed by protein–protein interactions (PPIs). Therefore, recognition of protein–protein interactions is very important for understanding most life activities, such as DNA replication and transcription, protein synthesis and secretion, signal transduction and metabolism. Although high-throughput technology makes it possible to generate large-scale PPIs data, it requires expensive cost of both time and labor, and leave a risk of high false positive rate. In order to formulate a more ingenious solution, biology community is looking for computational methods to quickly and efficiently discover massive protein interaction data. In this paper, we propose a computational method for predicting PPIs based on a fresh idea of combining orthogonal locality preserving projections (OLPP) and rotation forest (RoF) models, using protein sequence information. Specifically, the protein sequence is first converted into position-specific scoring matrices (PSSMs) containing protein evolutionary information by using the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then we characterize a protein as a fixed length feature vector by applying OLPP to PSSMs. Finally, we train an RoF classifier for the purpose of identifying non-interacting and interacting protein pairs. The proposed method yielded a significantly better results than existing methods, with 90.07% and 96.09% prediction accuracy on Yeast and Human datasets. Our experiment show the proposed method can serve as a useful tool to accelerate the process of solving key problems in proteomics.
- Published
- 2021
- Full Text
- View/download PDF
19. KGDCMI: A New Approach for Predicting circRNA–miRNA Interactions From Multi-Source Information Extraction and Deep Learning
- Author
-
Xin-Fei Wang, Chang-Qing Yu, Li-Ping Li, Zhu-Hong You, Wen-Zhun Huang, Yue-Chao Li, Zhong-Hao Ren, and Yong-Jian Guan
- Subjects
circRNA–miRNA interaction ,circRNA ,deep neural network ,graph embedding ,K-mer ,Genetics ,QH426-470 - Abstract
Emerging evidence has revealed that circular RNA (circRNA) is widely distributed in mammalian cells and functions as microRNA (miRNA) sponges involved in transcriptional and posttranscriptional regulation of gene expression. Recognizing the circRNA–miRNA interaction provides a new perspective for the detection and treatment of human complex diseases. Compared with the traditional biological experimental methods used to predict the association of molecules, which are limited to the small-scale and are time-consuming and laborious, computing models can provide a basis for biological experiments at low cost. Considering that the proposed calculation model is limited, it is necessary to develop an effective computational method to predict the circRNA–miRNA interaction. This study thus proposed a novel computing method, named KGDCMI, to predict the interactions between circRNA and miRNA based on multi-source information extraction and fusion. The KGDCMI obtains RNA attribute information from sequence and similarity, capturing the behavior information in RNA association through a graph-embedding algorithm. Then, the obtained feature vector is extracted further by principal component analysis and sent to the deep neural network for information fusion and prediction. At last, KGDCMI obtains the prediction accuracy (area under the curve [AUC] = 89.30% and area under the precision–recall curve [AUPR] = 87.67%). Meanwhile, with the same dataset, KGDCMI is 2.37% and 3.08%, respectively, higher than the only existing model, and we conducted three groups of comparative experiments, obtaining the best classification strategy, feature extraction parameters, and dimensions. In addition, in the performed case study, 7 of the top 10 interaction pairs were confirmed in PubMed. These results suggest that KGDCMI is a feasible and useful method to predict the circRNA–miRNA interaction and can act as a reliable candidate for related RNA biological experiments.
- Published
- 2022
- Full Text
- View/download PDF
20. BNEMDI: A Novel MicroRNA–Drug Interaction Prediction Model Based on Multi-Source Information With a Large-Scale Biological Network
- Author
-
Yong-Jian Guan, Chang-Qing Yu, Li-Ping Li, Zhu-Hong You, Zhong-Hao Ren, Jie Pan, and Yue-Chao Li
- Subjects
miRNA–drug interaction ,BiNE ,k-mer ,MACCS fingerprint ,deep neural network ,Genetics ,QH426-470 - Abstract
As a novel target in pharmacy, microRNA (miRNA) can regulate gene expression under specific disease conditions to produce specific proteins. To date, many researchers leveraged miRNA to reveal drug efficacy and pathogenesis at the molecular level. As we all know that conventional wet experiments suffer from many problems, including time-consuming, labor-intensity, and high cost. Thus, there is an urgent need to develop a novel computational model to facilitate the identification of miRNA–drug interactions (MDIs). In this work, we propose a novel bipartite network embedding-based method called BNEMDI to predict MDIs. First, the Bipartite Network Embedding (BiNE) algorithm is employed to learn the topological features from the network. Then, the inherent attributes of drugs and miRNAs are expressed as attribute features by MACCS fingerprints and k-mers. Finally, we feed these features into deep neural network (DNN) for training the prediction model. To validate the prediction ability of the BNEMDI model, we apply it to five different benchmark datasets under five-fold cross-validation, and the proposed model obtained excellent AUC values of 0.9568, 0.9420, 0.8489, 0.8774, and 0.9005 in ncDR, RNAInter, SM2miR1, SM2miR2, and SM2miR MDI datasets, respectively. To further verify the prediction performance of the BNEMDI model, we compare it with some existing powerful methods. We also compare the BiNE algorithm with several different network embedding methods. Furthermore, we carry out a case study on a common drug named 5-fluorouracil. Among the top 50 miRNAs predicted by the proposed model, there were 38 verified by the experimental literature. The comprehensive experiment results demonstrated that our method is effective and robust for predicting MDIs. In the future work, we hope that the BNEMDI model can be a reliable supplement method for the development of pharmacology and miRNA therapeutics.
- Published
- 2022
- Full Text
- View/download PDF
21. In silico drug repositioning using deep learning and comprehensive similarity measures
- Author
-
Hai-Cheng Yi, Zhu-Hong You, Lei Wang, Xiao-Rui Su, Xi Zhou, and Tong-Hai Jiang
- Subjects
Drug repositioning ,Drug–disease interaction ,Gated recurrent units ,Gaussian interaction profile kernel ,Machine learning ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Drug repositioning, meanings finding new uses for existing drugs, which can accelerate the processing of new drugs research and development. Various computational methods have been presented to predict novel drug–disease associations for drug repositioning based on similarity measures among drugs and diseases. However, there are some known associations between drugs and diseases that previous studies not utilized. Methods In this work, we develop a deep gated recurrent units model to predict potential drug–disease interactions using comprehensive similarity measures and Gaussian interaction profile kernel. More specifically, the similarity measure is used to exploit discriminative feature for drugs based on their chemical fingerprints. Meanwhile, the Gaussian interactions profile kernel is employed to obtain efficient feature of diseases based on known disease-disease associations. Then, a deep gated recurrent units model is developed to predict potential drug–disease interactions. Results The performance of the proposed model is evaluated on two benchmark datasets under tenfold cross-validation. And to further verify the predictive ability, case studies for predicting new potential indications of drugs were carried out. Conclusion The experimental results proved the proposed model is a useful tool for predicting new indications for drugs or new treatments for diseases, and can accelerate drug repositioning and related drug research and discovery.
- Published
- 2021
- Full Text
- View/download PDF
22. A structural deep network embedding model for predicting associations between miRNA and disease based on molecular association network
- Author
-
Hao-Yuan Li, Hai-Yan Chen, Lei Wang, Shen-Jian Song, Zhu-Hong You, Xin Yan, and Jin-Qian Yu
- Subjects
Medicine ,Science - Abstract
Abstract Previous studies indicated that miRNA plays an important role in human biological processes especially in the field of diseases. However, constrained by biotechnology, only a small part of the miRNA-disease associations has been verified by biological experiment. This impel that more and more researchers pay attention to develop efficient and high-precision computational methods for predicting the potential miRNA-disease associations. Based on the assumption that molecules are related to each other in human physiological processes, we developed a novel structural deep network embedding model (SDNE-MDA) for predicting miRNA-disease association using molecular associations network. Specifically, the SDNE-MDA model first integrating miRNA attribute information by Chao Game Representation (CGR) algorithm and disease attribute information by disease semantic similarity. Secondly, we extract feature by structural deep network embedding from the heterogeneous molecular associations network. Then, a comprehensive feature descriptor is constructed by combining attribute information and behavior information. Finally, Convolutional Neural Network (CNN) is adopted to train and classify these feature descriptors. In the five-fold cross validation experiment, SDNE-MDA achieved AUC of 0.9447 with the prediction accuracy of 87.38% on the HMDD v3.0 dataset. To further verify the performance of SDNE-MDA, we contrasted it with different feature extraction models and classifier models. Moreover, the case studies with three important human diseases, including Breast Neoplasms, Kidney Neoplasms, Lymphoma were implemented by the proposed model. As a result, 47, 46 and 46 out of top-50 predicted disease-related miRNAs have been confirmed by independent databases. These results anticipate that SDNE-MDA would be a reliable computational tool for predicting potential miRNA-disease associations.
- Published
- 2021
- Full Text
- View/download PDF
23. Prediction of lncRNA-disease associations via an embedding learning HOPE in heterogeneous information networks
- Author
-
Ji-Ren Zhou, Zhu-Hong You, Li Cheng, and Bo-Ya Ji
- Subjects
lncRNA-disease associations ,deep learning ,heterogeneous information networks ,rotation forest ,Therapeutics. Pharmacology ,RM1-950 - Abstract
Uncovering additional long non-coding RNA (lncRNA)-disease associations has become increasingly important for developing treatments for complex human diseases. Identification of lncRNA biomarkers and lncRNA-disease associations is central to diagnoses and treatment. However, traditional experimental methods are expensive and time-consuming. Enormous amounts of data present in public biological databases are available for computational methods used to predict lncRNA-disease associations. In this study, we propose a novel computational method to predict lncRNA-disease associations. More specifically, a heterogeneous network is first constructed by integrating the associations among microRNA (miRNA), lncRNA, protein, drug, and disease, Second, high-order proximity preserved embedding (HOPE) was used to embed nodes into a network. Finally, the rotation forest classifier was adopted to train the prediction model. In the 5-fold cross-validation experiment, the area under the curve (AUC) of our method achieved 0.8328 ± 0.0236. We compare it with the other four classifiers, in which the proposed method remarkably outperformed other comparison methods. Otherwise, we constructed three case studies for three excess death rate cancers, respectively. The results show that 9 (lung cancer, gastric cancer, and hepatocellular carcinomas) out of the top 15 predicted disease-related lncRNAs were confirmed by our method. In conclusion, our method could predict the unknown lncRNA-disease associations effectively.
- Published
- 2021
- Full Text
- View/download PDF
24. NEMPD: a network embedding-based method for predicting miRNA-disease associations by preserving behavior and attribute information
- Author
-
Bo-Ya Ji, Zhu-Hong You, Zhan-Heng Chen, Leon Wong, and Hai-Cheng Yi
- Subjects
miRNA-disease associations ,Heterogeneous network ,GraRep ,Random Forest ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background As an important non-coding RNA, microRNA (miRNA) plays a significant role in a series of life processes and is closely associated with a variety of Human diseases. Hence, identification of potential miRNA-disease associations can make great contributions to the research and treatment of Human diseases. However, to our knowledge, many existing computational methods only utilize the single type of known association information between miRNAs and diseases to predict their potential associations, without focusing on their interactions or associations with other types of molecules. Results In this paper, we propose a network embedding-based method for predicting miRNA-disease associations by preserving behavior and attribute information. Firstly, a heterogeneous network is constructed by integrating known associations among miRNA, protein and disease, and the network representation method Learning Graph Representations with Global Structural Information (GraRep) is implemented to learn the behavior information of miRNAs and diseases in the network. Then, the behavior information of miRNAs and diseases is combined with the attribute information of them to represent miRNA-disease association pairs. Finally, the prediction model is established based on the Random Forest algorithm. Under the five-fold cross validation, the proposed NEMPD model obtained average 85.41% prediction accuracy with 80.96% sensitivity at the AUC of 91.58%. Furthermore, the performance of NEMPD is also validated by the case studies. Among the top 50 predicted disease-related miRNAs, 48 (breast neoplasms), 47 (colon neoplasms), 47 (lung neoplasms) were confirmed by two other databases. Conclusions The proposed NEMPD model has a good performance in predicting the potential associations between miRNAs and diseases, and has great potency in the field of miRNA-disease association prediction in the future.
- Published
- 2020
- Full Text
- View/download PDF
25. Prediction of drug-target interactions from multi-molecular network based on LINE network representation method
- Author
-
Bo-Ya Ji, Zhu-Hong You, Han-Jing Jiang, Zhen-Hao Guo, and Kai Zheng
- Subjects
Drug-target interactions ,Heterogeneous information network ,LINE ,Random forest ,Medicine - Abstract
Abstract Background The prediction of potential drug-target interactions (DTIs) not only provides a better comprehension of biological processes but also is critical for identifying new drugs. However, due to the disadvantages of expensive and high time-consuming traditional experiments, only a small section of interactions between drugs and targets in the database were verified experimentally. Therefore, it is meaningful and important to develop new computational methods with good performance for DTIs prediction. At present, many existing computational methods only utilize the single type of interactions between drugs and proteins without paying attention to the associations and influences with other types of molecules. Methods In this work, we developed a novel network embedding-based heterogeneous information integration model to predict potential drug-target interactions. Firstly, a heterogeneous multi-molecuar information network is built by combining the known associations among protein, drug, lncRNA, disease, and miRNA. Secondly, the Large-scale Information Network Embedding (LINE) model is used to learn behavior information (associations with other nodes) of drugs and proteins in the network. Hence, the known drug-protein interaction pairs can be represented as a combination of attribute information (e.g. protein sequences information and drug molecular fingerprints) and behavior information of themselves. Thirdly, the Random Forest classifier is used for training and prediction. Results In the results, under the five-fold cross validation, our method obtained 85.83% prediction accuracy with 80.47% sensitivity at the AUC of 92.33%. Moreover, in the case studies of three common drugs, the top 10 candidate targets have 8 (Caffeine), 7 (Clozapine) and 6 (Pioglitazone) are respectively verified to be associated with corresponding drugs. Conclusions In short, these results indicate that our method can be a powerful tool for predicting potential drug-target interactions and finding unknown targets for certain drugs or unknown drugs for certain targets.
- Published
- 2020
- Full Text
- View/download PDF
26. MIPDH: A Novel Computational Model for Predicting microRNA–mRNA Interactions by DeepWalk on a Heterogeneous Network
- Author
-
Leon Wong, Zhu-Hong You, Zhen-Hao Guo, Hai-Cheng Yi, Zhan-Heng Chen, and Mei-Yuan Cao
- Subjects
Chemistry ,QD1-999 - Published
- 2020
- Full Text
- View/download PDF
27. A deep learning-based method for drug-target interaction prediction based on long short-term memory neural network
- Author
-
Yan-Bin Wang, Zhu-Hong You, Shan Yang, Hai-Cheng Yi, Zhan-Heng Chen, and Kai Zheng
- Subjects
Drug-target ,Deep learning ,Legendre moment ,Long short-term memory ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Abstract Background The key to modern drug discovery is to find, identify and prepare drug molecular targets. However, due to the influence of throughput, precision and cost, traditional experimental methods are difficult to be widely used to infer these potential Drug-Target Interactions (DTIs). Therefore, it is urgent to develop effective computational methods to validate the interaction between drugs and target. Methods We developed a deep learning-based model for DTIs prediction. The proteins evolutionary features are extracted via Position Specific Scoring Matrix (PSSM) and Legendre Moment (LM) and associated with drugs molecular substructure fingerprints to form feature vectors of drug-target pairs. Then we utilized the Sparse Principal Component Analysis (SPCA) to compress the features of drugs and proteins into a uniform vector space. Lastly, the deep long short-term memory (DeepLSTM) was constructed for carrying out prediction. Results A significant improvement in DTIs prediction performance can be observed on experimental results, with AUC of 0.9951, 0.9705, 0.9951, 0.9206, respectively, on four classes important drug-target datasets. Further experiments preliminary proves that the proposed characterization scheme has great advantage on feature expression and recognition. We also have shown that the proposed method can work well with small dataset. Conclusion The results demonstration that the proposed approach has a great advantage over state-of-the-art drug-target predictor. To the best of our knowledge, this study first tests the potential of deep learning method with memory and Turing completeness in DTIs prediction.
- Published
- 2020
- Full Text
- View/download PDF
28. RPI-SE: a stacking ensemble learning framework for ncRNA-protein interactions prediction using sequence information
- Author
-
Hai-Cheng Yi, Zhu-Hong You, Mei-Neng Wang, Zhen-Hao Guo, Yan-Bin Wang, and Ji-Ren Zhou
- Subjects
Sequence analysis ,RNA-protein interaction ,ncRNA ,Ensemble learning ,Position weight matrix ,Legendre moments ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background The interactions between non-coding RNAs (ncRNA) and proteins play an essential role in many biological processes. Several high-throughput experimental methods have been applied to detect ncRNA-protein interactions. However, these methods are time-consuming and expensive. Accurate and efficient computational methods can assist and accelerate the study of ncRNA-protein interactions. Results In this work, we develop a stacking ensemble computational framework, RPI-SE, for effectively predicting ncRNA-protein interactions. More specifically, to fully exploit protein and RNA sequence feature, Position Weight Matrix combined with Legendre Moments is applied to obtain protein evolutionary information. Meanwhile, k-mer sparse matrix is employed to extract efficient feature of ncRNA sequences. Finally, an ensemble learning framework integrated different types of base classifier is developed to predict ncRNA-protein interactions using these discriminative features. The accuracy and robustness of RPI-SE was evaluated on three benchmark data sets under five-fold cross-validation and compared with other state-of-the-art methods. Conclusions The results demonstrate that RPI-SE is competent for ncRNA-protein interactions prediction task with high accuracy and robustness. It’s anticipated that this work can provide a computational prediction tool to advance ncRNA-protein interactions related biomedical research.
- Published
- 2020
- Full Text
- View/download PDF
29. Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions
- Author
-
Hai-Cheng Yi, Zhu-Hong You, Li Cheng, Xi Zhou, Tong-Hai Jiang, Xiao Li, and Yan-Bin Wang
- Subjects
Distribution representation ,Natural language processing ,Word2vec ,RNA-protein interaction ,Biotechnology ,TP248.13-248.65 - Abstract
The long noncoding RNAs (lncRNAs) are ubiquitous in organisms and play crucial role in a variety of biological processes and complex diseases. Emerging evidences suggest that lncRNAs interact with corresponding proteins to perform their regulatory functions. Therefore, identifying interacting lncRNA-protein pairs is the first step in understanding the function and mechanism of lncRNA. Since it is time-consuming and expensive to determine lncRNA-protein interactions by high-throughput experiments, more robust and accurate computational methods need to be developed. In this study, we developed a new sequence distributed representation learning based method for potential lncRNA-Protein Interactions Prediction, named LPI-Pred, which is inspired by the similarity between natural language and biological sequences. More specifically, lncRNA and protein sequences were divided into k-mer segmentation, which can be regard as “word” in natural language processing. Then, we trained out the RNA2vec and Pro2vec model using word2vec and human genome-wide lncRNA and protein sequences to mine distribution representation of RNA and protein. Then, the dimension of complex features is reduced by using feature selection based on Gini information impurity measure. Finally, these discriminative features are used to train a Random Forest classifier to predict lncRNA-protein interactions. Five-fold cross-validation was adopted to evaluate the performance of LPI-Pred on three benchmark datasets, including RPI369, RPI488 and RPI2241. The results demonstrate that LPI-Pred can be a useful tool to provide reliable guidance for biological research.
- Published
- 2020
- Full Text
- View/download PDF
30. iMDA-BN: Identification of miRNA-disease associations based on the biological network and graph embedding algorithm
- Author
-
Kai Zheng, Zhu-Hong You, Lei Wang, and Zhen-Hao Guo
- Subjects
miRNA ,Disease ,Heterogenous information ,The biological network ,Graph embedding algorithm ,Biotechnology ,TP248.13-248.65 - Abstract
Benefiting from advances in high-throughput experimental techniques, important regulatory roles of miRNAs, lncRNAs, and proteins, as well as biological property information, are gradually being complemented. As the key data support to promote biomedical research, domain knowledge such as intermolecular relationships that are increasingly revealed by molecular genome-wide analysis is often used to guide the discovery of potential associations. However, the method of performing network representation learning from the perspective of the global biological network is scarce. These methods cover a very limited type of molecular associations and are therefore not suitable for more comprehensive analysis of molecular network representation information. In this study, we propose a computational model based on the Biological network for predicting potential associations between miRNAs and diseases called iMDA-BN. The iMDA-BN has three significant advantages: I) It uses a new method to describe disease and miRNA characteristics which analyzes node representation information for disease and miRNA from the perspective of biological networks. II) It can predict unproven associations even if miRNAs and diseases do not appear in the biological network. III) Accurate description of miRNA characteristics from biological properties based on high-throughput sequence information. The iMDA-BN predictor achieves an AUC of 0.9145 and an accuracy of 84.49% on the miRNA-disease association baseline dataset, and it can also achieve an AUC of 0.8765 and an accuracy of 80.96% when predicting unknown diseases and miRNAs in the biological network. Compared to existing miRNA-disease association prediction methods, iMDA-BN has higher accuracy and the advantage of predicting unknown associations. In addition, 45, 49, and 49 of the top 50 miRNA-disease associations with the highest predicted scores were confirmed in the case studies, respectively.
- Published
- 2020
- Full Text
- View/download PDF
31. GNMFLMI: Graph Regularized Nonnegative Matrix Factorization for Predicting LncRNA-MiRNA Interactions
- Author
-
Mei-Neng Wang, Zhu-Hong You, Li-Ping Li, Leon Wong, Zhan-Heng Chen, and Cheng-Zhi Gan
- Subjects
Graph regularization ,lncRNA-miRNA interaction ,lncRNA-miRNA similarity ,nonnegative matrix factorization ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) have been involved in various biological processes. Emerging evidence suggests that the interactions between lncRNAs and miRNAs play an important role in the regulation of genes and the development of many diseases. Due to the limited scale of known lncRNA-miRNA interactions, and expensive time and labor costs for identifying them by biological experiments, more accurate and efficient lncRNA-miRNA interaction computational prediction approach urgently need to be developed. In this work, we proposed a novel computational model, GNMFLMI, to predict lncRNA-miRNA interactions using graph regularized nonnegative matrix factorization. More specifically, the similarities both lncRNA and miRNA are calculated based on known interaction information and their sequence information. Then, the affinity graphs for lncRNAs and miRNAs are constructed using the $p$ -nearest neighbors, respectively. Finally, a graph regularized nonnegative matrix factorization model is developed to accurately infer potential interactions between lncRNAs and miRNAs. To assess the performance of GNMFLMI, five-fold cross-validation experiments are carried out. The AUC values achieved by GNMFLMI on two datasets are 0.9769 and 0.8894, respectively, which outperform the compared methods. In the case studies for lncRNA nonhsat159254.1 and miRNA hsa-mir-544a, 20 and 16 of the top-20 associations predicted by GNMFLMI are confirmed, respectively. Rigorous experimental results demonstrate that GNMFLMI can effectively predict novel lncRNA-miRNA interactions, which can provide guidance for relevant biomedical research. The source code of GNMFLMI is freely available at https://github.com/haichengyi/GNMFLMI.
- Published
- 2020
- Full Text
- View/download PDF
32. Prediction of Drug-Target Interactions by Ensemble Learning Method From Protein Sequence and Drug Fingerprint
- Author
-
Xinke Zhan, Zhu-Hong You, Jinfan Cai, Liping Li, Changqing Yu, Jie Pan, and Jiangkun Kong
- Subjects
Drug-target interaction ,local optimal oriented pattern ,position specific scoring matrix ,rotation forest ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Predicting the target-drug interactions (DITs) is of great important for screening new drug candidate and understanding biological processes. However, identifying the drug-target interactions through traditional experiments is still costly, laborious and complicated. Thus, there is a great need for developing reliable computational methods to effectively predict DTIs. In this study, we report a novel computational method combining local optimal oriented pattern (LOOP), Position Specific Scoring Matrix (PSSM) and Rotation Forest (RF) for predicting DTI. Specifically, the target protein sequence is firstly transformed as the PSSM, in which the evolutionary information of protein is retained. Then, the LOOP is used to extract the feature vectors from PSSM, and the sub-structure information of drug molecule is represented as fingerprint features. Finally, RF classifier is adopted to infer the potential drug-target interactions. When the experiment is carried out on four benchmark datasets including enzyme, ion channel, Gprotein-coupled receptors (GPCRs), and nuclear receptor, we achieved the high average prediction accuracies of 89.09%, 87.53%, 82.05%, and 73.33% respectively. For further evaluating the proposed method, we compare the prediction performance of the proposed method with the state-of-the-art support vector machine (SVM) and K-Nearest Neighbor (KNN). The comprehensive experimental results illustrate that the proposed method is reliable and efficiency for predicting DTIs. It is anticipated that the proposed method can become a useful tool for predicting a large-scale potential DTIs.
- Published
- 2020
- Full Text
- View/download PDF
33. DWPPI: A Deep Learning Approach for Predicting Protein–Protein Interactions in Plants Based on Multi-Source Information With a Large-Scale Biological Network
- Author
-
Jie Pan, Zhu-Hong You, Li-Ping Li, Wen-Zhun Huang, Jian-Xin Guo, Chang-Qing Yu, Li-Ping Wang, and Zheng-Yang Zhao
- Subjects
plant ,protein-protein interaction ,network embedding ,multi-source information ,deep neural networks ,Biotechnology ,TP248.13-248.65 - Abstract
The prediction of protein–protein interactions (PPIs) in plants is vital for probing the cell function. Although multiple high-throughput approaches in the biological domain have been developed to identify PPIs, with the increasing complexity of PPI network, these methods fall into laborious and time-consuming situations. Thus, it is essential to develop an effective and feasible computational method for the prediction of PPIs in plants. In this study, we present a network embedding-based method, called DWPPI, for predicting the interactions between different plant proteins based on multi-source information and combined with deep neural networks (DNN). The DWPPI model fuses the protein natural language sequence information (attribute information) and protein behavior information to represent plant proteins as feature vectors and finally sends these features to a deep learning–based classifier for prediction. To validate the prediction performance of DWPPI, we performed it on three model plant datasets: Arabidopsis thaliana (A. thaliana), mazie (Zea mays), and rice (Oryza sativa). The experimental results with the fivefold cross-validation technique demonstrated that DWPPI obtains great performance with the AUC (area under ROC curves) values of 0.9548, 0.9867, and 0.9213, respectively. To further verify the predictive capacity of DWPPI, we compared it with some different state-of-the-art machine learning classifiers. Moreover, case studies were performed with the AC149810.2_FGP003 protein. As a result, 14 of the top 20 PPI pairs identified by DWPPI with the highest scores were confirmed by the literature. These excellent results suggest that the DWPPI model can act as a promising tool for related plant molecular biology.
- Published
- 2022
- Full Text
- View/download PDF
34. SAWRPI: A Stacking Ensemble Framework With Adaptive Weight for Predicting ncRNA-Protein Interactions Using Sequence Information
- Author
-
Zhong-Hao Ren, Chang-Qing Yu, Li-Ping Li, Zhu-Hong You, Yong-Jian Guan, Yue-Chao Li, and Jie Pan
- Subjects
ncRNA-protein interactions ,ncRNA ,ensemble learning ,sequence analysis ,natural language processing ,Genetics ,QH426-470 - Abstract
Non-coding RNAs (ncRNAs) take essential effects on biological processes, like gene regulation. One critical way of ncRNA executing biological functions is interactions between ncRNA and RNA binding proteins (RBPs). Identifying proteins, involving ncRNA-protein interactions, can well understand the function ncRNA. Many high-throughput experiment have been applied to recognize the interactions. As a consequence of these approaches are time- and labor-consuming, currently, a great number of computational methods have been developed to improve and advance the ncRNA-protein interactions research. However, these methods may be not available to all RNAs and proteins, particularly processing new RNAs and proteins. Additionally, most of them cannot process well with long sequence. In this work, a computational method SAWRPI is proposed to make prediction of ncRNA-protein through sequence information. More specifically, the raw features of protein and ncRNA are firstly extracted through the k-mer sparse matrix with SVD reduction and learning nucleic acid symbols by natural language processing with local fusion strategy, respectively. Then, to classify easily, Hilbert Transformation is exploited to transform raw feature data to the new feature space. Finally, stacking ensemble strategy is adopted to learn high-level abstraction features automatically and generate final prediction results. To confirm the robustness and stability, three different datasets containing two kinds of interactions are utilized. In comparison with state-of-the-art methods and other results classifying or feature extracting strategies, SAWRPI achieved high performance on three datasets, containing two kinds of lncRNA-protein interactions. Upon our finding, SAWRPI is a trustworthy, robust, yet simple and can be used as a beneficial supplement to the task of predicting ncRNA-protein interactions.
- Published
- 2022
- Full Text
- View/download PDF
35. Sequence-Based Prediction of Plant Protein-Protein Interactions by Combining Discrete Sine Transformation With Rotation Forest
- Author
-
Jie Pan, Li-Ping Li, Chang-Qing Yu, Zhu-Hong You, Yong-Jian Guan, and Zhong-Hao Ren
- Subjects
Evolution ,QH359-425 - Abstract
Protein-protein interactions (PPIs) in plants are essential for understanding the regulation of biological processes. Although high-throughput technologies have been widely used to identify PPIs, they are usually laborious, expensive, and suffer from high false-positive rates. Therefore, it is imperative to develop novel computational approaches as a supplement tool to detect PPIs in plants. In this work, we presented a method, namely DST-RoF, to identify PPIs in plants by combining an ensemble learning classifier-Rotation Forest (RoF) with discrete sine transformation (DST). Specifically, plant protein sequence is firstly converted into Position-Specific Scoring Matrix (PSSM). Then, the discrete sine transformation was employed to extract effective features for obtaining the evolutionary information of proteins. Finally, these optimal features were fed into the RoF classifier for training and prediction. When performed on the plant datasets Arabidopsis, Rice, and Maize, DST-RoF yielded high prediction accuracy of 82.95%, 88.82%, and 93.70%, respectively. To further evaluate the prediction ability of our approach, we compared it with 4 state-of-the-art classifiers and 3 different feature extraction methods. Comprehensive experimental results anticipated that our method is feasible and robust for predicting potential plant-protein interacted pairs.
- Published
- 2021
- Full Text
- View/download PDF
36. MFIDMA: A Multiple Information Integration Model for the Prediction of Drug–miRNA Associations
- Author
-
Yong-Jian Guan, Chang-Qing Yu, Yan Qiao, Li-Ping Li, Zhu-Hong You, Zhong-Hao Ren, Yue-Chao Li, and Jie Pan
- Subjects
drug–miRNA association ,SDNE ,Word2vec ,SMILES ,deep neural network ,convolution neural network ,Biology (General) ,QH301-705.5 - Abstract
Abnormal microRNA (miRNA) functions play significant roles in various pathological processes. Thus, predicting drug–miRNA associations (DMA) may hold great promise for identifying the potential targets of drugs. However, discovering the associations between drugs and miRNAs through wet experiments is time-consuming and laborious. Therefore, it is significant to develop computational prediction methods to improve the efficiency of identifying DMA on a large scale. In this paper, a multiple features integration model (MFIDMA) is proposed to predict drug–miRNA association. Specifically, we first formulated known DMA as a bipartite graph and utilized structural deep network embedding (SDNE) to learn the topological features from the graph. Second, the Word2vec algorithm was utilized to construct the attribute features of the miRNAs and drugs. Third, two kinds of features were entered into the convolution neural network (CNN) and deep neural network (DNN) to integrate features and predict potential target miRNAs for the drugs. To evaluate the MFIDMA model, it was implemented on three different datasets under a five-fold cross-validation and achieved average AUCs of 0.9407, 0.9444 and 0.8919. In addition, the MFIDMA model showed reliable results in the case studies of Verapamil and hsa-let-7c-5p, confirming that the proposed model can also predict DMA in real-world situations. The model was effective in analyzing the neighbors and topological features of the drug–miRNA network by SDNE. The experimental results indicated that the MFIDMA is an accurate and robust model for predicting potential DMA, which is significant for miRNA therapeutics research and drug discovery.
- Published
- 2022
- Full Text
- View/download PDF
37. Prediction of Protein–Protein Interactions in Arabidopsis, Maize, and Rice by Combining Deep Neural Network With Discrete Hilbert Transform
- Author
-
Jie Pan, Li-Ping Li, Zhu-Hong You, Chang-Qing Yu, Zhong-Hao Ren, and Yong-Jian Guan
- Subjects
deep neural networks ,discrete hilbert transform ,plant ,protein–protein interactions ,position-specific scoring matrix ,Genetics ,QH426-470 - Abstract
Protein–protein interactions (PPIs) in plants play an essential role in the regulation of biological processes. However, traditional experimental methods are expensive, time-consuming, and need sophisticated technical equipment. These drawbacks motivated the development of novel computational approaches to predict PPIs in plants. In this article, a new deep learning framework, which combined the discrete Hilbert transform (DHT) with deep neural networks (DNN), was presented to predict PPIs in plants. To be more specific, plant protein sequences were first transformed as a position-specific scoring matrix (PSSM). Then, DHT was employed to capture features from the PSSM. To improve the prediction accuracy, we used the singular value decomposition algorithm to decrease noise and reduce the dimensions of the feature descriptors. Finally, these feature vectors were fed into DNN for training and predicting. When performing our method on three plant PPI datasets Arabidopsis thaliana, maize, and rice, we achieved good predictive performance with average area under receiver operating characteristic curve values of 0.8369, 0.9466, and 0.9440, respectively. To fully verify the predictive ability of our method, we compared it with different feature descriptors and machine learning classifiers. Moreover, to further demonstrate the generality of our approach, we also test it on the yeast and human PPI dataset. Experimental results anticipated that our method is an efficient and promising computational model for predicting potential plant–protein interacted pairs.
- Published
- 2021
- Full Text
- View/download PDF
38. A Novel Network-Based Algorithm for Predicting Protein-Protein Interactions Using Gene Ontology
- Author
-
Lun Hu, Xiaojuan Wang, Yu-An Huang, Pengwei Hu, and Zhu-Hong You
- Subjects
protein-protein interaction ,prediction ,network topology ,gene ontology ,modularity ,Microbiology ,QR1-502 - Abstract
Proteins are one of most significant components in living organism, and their main role in cells is to undertake various physiological functions by interacting with each other. Thus, the prediction of protein-protein interactions (PPIs) is crucial for understanding the molecular basis of biological processes, such as chronic infections. Given the fact that laboratory-based experiments are normally time-consuming and labor-intensive, computational prediction algorithms have become popular at present. However, few of them could simultaneously consider both the structural information of PPI networks and the biological information of proteins for an improved accuracy. To do so, we assume that the prior information of functional modules is known in advance and then simulate the generative process of a PPI network associated with the biological information of proteins, i.e., Gene Ontology, by using an established Bayesian model. In order to indicate to what extent two proteins are likely to interact with each other, we propose a novel scoring function by combining the membership distributions of proteins with network paths. Experimental results show that our algorithm has a promising performance in terms of several independent metrics when compared with state-of-the-art prediction algorithms, and also reveal that the consideration of modularity in PPI networks provides us an alternative, yet much more flexible, way to accurately predict PPIs.
- Published
- 2021
- Full Text
- View/download PDF
39. Identification of self-interacting proteins by integrating random projection classifier and finite impulse response filter
- Author
-
Zhan-Heng Chen, Zhu-Hong You, Li-Ping Li, Yan-Bin Wang, Yu Qiu, and Peng-Wei Hu
- Subjects
Self-interacting proteins ,PSSM ,Random projection ,Finite impulse response filter ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Identification of protein-protein interactions (PPIs) is crucial for understanding biological processes and investigating the cellular functions of genes. Self-interacting proteins (SIPs) are those in which more than two identical proteins can interact with each other and they are the specific type of PPIs. More and more researchers draw attention to the SIPs detection, and several prediction model have been proposed, but there are still some problems. Hence, there is an urgent need to explore a efficient computational model for SIPs prediction. Results In this study, we developed an effective model to predict SIPs, called RP-FIRF, which merges the Random Projection (RP) classifier and Finite Impulse Response Filter (FIRF) together. More specifically, each protein sequence was firstly transformed into the Position Specific Scoring Matrix (PSSM) by exploiting Position Specific Iterated BLAST (PSI-BLAST). Then, to effectively extract the discriminary SIPs feature to improve the performance of SIPs prediction, a FIRF method was used on PSSM. The R’classifier was proposed to execute the classification and predict novel SIPs. We evaluated the performance of the proposed RP-FIRF model and compared it with the state-of-the-art support vector machine (SVM) on human and yeast datasets, respectively. The proposed model can achieve high average accuracies of 97.89 and 97.35% using five-fold cross-validation. To further evaluate the high performance of the proposed method, we also compared it with other six exiting methods, the experimental results demonstrated that the capacity of our model surpass that of the other previous approaches. Conclusion Experimental results show that self-interacting proteins are accurately well-predicted by the proposed model on human and yeast datasets, respectively. It fully show that the proposed model can predict the SIPs effectively and sufficiently. Thus, RP-FIRF model is an automatic decision support method which should provide useful insights into the recognition of SIPs.
- Published
- 2019
- Full Text
- View/download PDF
40. Predicting drug−disease associations via sigmoid kernel-based convolutional neural networks
- Author
-
Han-Jing Jiang, Zhu-Hong You, and Yu-An Huang
- Subjects
Sigmoid kernel ,Convolutional Neural Networks ,Random forest ,Medicine - Abstract
Abstract Background In the process of drug development, computational drug repositioning is effective and resource-saving with regards to its important functions on identifying new drug–disease associations. Recent years have witnessed a great progression in the field of data mining with the advent of deep learning. An increasing number of deep learning-based techniques have been proposed to develop computational tools in bioinformatics. Methods Along this promising direction, we here propose a drug repositioning computational method combining the techniques of Sigmoid Kernel and Convolutional Neural Network (SKCNN) which is able to learn new features effectively representing drug–disease associations via its hidden layers. Specifically, we first construct similarity metric of drugs using drug sigmoid similarity and drug structural similarity, and that of disease using disease sigmoid similarity and disease semantic similarity. Based on the combined similarities of drugs and diseases, we then use SKCNN to learn hidden representations for each drug-disease pair whose labels are finally predicted by a classifier based on random forest. Results A series of experiments were implemented for performance evaluation and their results show that the proposed SKCNN improves the prediction accuracy compared with other state-of-the-art approaches. Case studies of two selected disease are also conducted through which we prove the superior performance of our method in terms of the actual discovery of potential drug indications. Conclusion The aim of this study was to establish an effective predictive model for finding new drug–disease associations. These experimental results show that SKCNN can effectively predict the association between drugs and diseases.
- Published
- 2019
- Full Text
- View/download PDF
41. ACP-DL: A Deep Learning Long Short-Term Memory Model to Predict Anticancer Peptides Using High-Efficiency Feature Representation
- Author
-
Hai-Cheng Yi, Zhu-Hong You, Xi Zhou, Li Cheng, Xiao Li, Tong-Hai Jiang, and Zhan-Heng Chen
- Subjects
Therapeutics. Pharmacology ,RM1-950 - Abstract
Cancer is a well-known killer of human beings, which has led to countless deaths and misery. Anticancer peptides open a promising perspective for cancer treatment, and they have various attractive advantages. Conventional wet experiments are expensive and inefficient for finding and identifying novel anticancer peptides. There is an urgent need to develop a novel computational method to predict novel anticancer peptides. In this study, we propose a deep learning long short-term memory (LSTM) neural network model, ACP-DL, to effectively predict novel anticancer peptides. More specifically, to fully exploit peptide sequence information, we developed an efficient feature representation approach by integrating binary profile feature and k-mer sparse matrix of the reduced amino acid alphabet. Then we implemented a deep LSTM model to automatically learn how to identify anticancer peptides and non-anticancer peptides. To our knowledge, this is the first time that the deep LSTM model has been applied to predict anticancer peptides. It was demonstrated by cross-validation experiments that the proposed ACP-DL remarkably outperformed other comparison methods with high accuracy and satisfied specificity on benchmark datasets. In addition, we also contributed two new anticancer peptides benchmark datasets, ACP740 and ACP240, in this work. The source code and datasets are available at https://github.com/haichengyi/ACP-DL. Keywords: anticancer peptides, long short-term memory, deep learning, binary profile feature, k-mer sparse matrix
- Published
- 2019
- Full Text
- View/download PDF
42. A Learning-Based Method for LncRNA-Disease Association Identification Combing Similarity Information and Rotation Forest
- Author
-
Zhen-Hao Guo, Zhu-Hong You, Yan-Bin Wang, Hai-Cheng Yi, and Zhan-Heng Chen
- Subjects
Science - Abstract
Summary: Long non-coding RNA (lncRNA) play critical roles in the occurrence and development of various diseases. The determination of the lncRNA-disease associations thus would contribute to provide new insights into the pathogenesis of the disease, the diagnosis, and the gene treatments. Considering that traditional experimental approaches are difficult to detect potential human lncRNA-disease associations from the vast amount of biological data, developing computational method could be of significant value. In this paper, we proposed a novel computational method named LDASR to identify associations between lncRNA and disease by analyzing known lncRNA-disease associations. First, the feature vectors of the lncRNA-disease pairs were obtained by integrating lncRNA Gaussian interaction profile kernel similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity. Second, autoencoder neural network was employed to reduce the feature dimension and get the optimal feature subspace from the original feature set. Finally, Rotating Forest was used to carry out prediction of lncRNA-disease association. The proposed method achieves an excellent preference with 0.9502 AUC in leave-one-out cross-validations (LOOCV) and 0.9428 AUC in 5-fold cross-validation, which significantly outperformed previous methods. Moreover, two kinds of case studies on identifying lncRNAs associated with colorectal cancer and glioma further proves the capability of LDASR in identifying novel lncRNA-disease associations. The promising experimental results show that the LDASR can be an excellent addition to the biomedical research in the future. : Bioinformatics; Biocomputational Method; Computational Bioinformatics Subject Areas: Bioinformatics, Biocomputational Method, Computational Bioinformatics
- Published
- 2019
- Full Text
- View/download PDF
43. MLMDA: a machine learning approach to predict and validate MicroRNA–disease associations by integrating of heterogenous information sources
- Author
-
Kai Zheng, Zhu-Hong You, Lei Wang, Yong Zhou, Li-Ping Li, and Zheng-Wei Li
- Subjects
microRNA ,Disease ,Association prediction ,Auto-encoder neural network ,Random forest ,Medicine - Abstract
Abstract Background Emerging evidences show that microRNA (miRNA) plays an important role in many human complex diseases. However, considering the inherent time-consuming and expensive of traditional in vitro experiments, more and more attention has been paid to the development of efficient and feasible computational methods to predict the potential associations between miRNA and disease. Methods In this work, we present a machine learning-based model called MLMDA for predicting the association of miRNAs and diseases. More specifically, we first use the k-mer sparse matrix to extract miRNA sequence information, and combine it with miRNA functional similarity, disease semantic similarity and Gaussian interaction profile kernel similarity information. Then, more representative features are extracted from them through deep auto-encoder neural network (AE). Finally, the random forest classifier is used to effectively predict potential miRNA–disease associations. Results The experimental results show that the MLMDA model achieves promising performance under fivefold cross validations with AUC values of 0.9172, which is higher than the methods using different classifiers or different feature combination methods mentioned in this paper. In addition, to further evaluate the prediction performance of MLMDA model, case studies are carried out with three Human complex diseases including Lymphoma, Lung Neoplasm, and Esophageal Neoplasms. As a result, 39, 37 and 36 out of the top 40 predicted miRNAs are confirmed by other miRNA–disease association databases. Conclusions These prominent experimental results suggest that the MLMDA model could serve as a useful tool guiding the future experimental validation for those promising miRNA biomarker candidates. The source code and datasets explored in this work are available at http://220.171.34.3:81/.
- Published
- 2019
- Full Text
- View/download PDF
44. In Silico Prediction of Small Molecule-miRNA Associations Based on the HeteSim Algorithm
- Author
-
Jia Qu, Xing Chen, Ya-Zhou Sun, Yan Zhao, Shu-Bin Cai, Zhong Ming, Zhu-Hong You, and Jian-Qiang Li
- Subjects
Therapeutics. Pharmacology ,RM1-950 - Abstract
Targeting microRNAs (miRNAs) with drug small molecules (SMs) is a new treatment method for many human complex diseases. Unsurprisingly, identification of potential miRNA-SM associations is helpful for pharmaceutical engineering and disease therapy in the field of medical research. In this paper, we developed a novel computational model of HeteSim-based inference for SM-miRNA Association prediction (HSSMMA) by implementing a path-based measurement method of HeteSim on a heterogeneous network combined with known miRNA-SM associations, integrated miRNA similarity, and integrated SM similarity. Through considering paths from an SM to a miRNA in the heterogeneous network, the model can capture the semantics information under each path and predict potential miRNA-SM associations based on all the considered paths. We performed global, miRNA-fixed local and SM-fixed local leave one out cross validation (LOOCV) as well as 5-fold cross validation based on the dataset of known miRNA-SM associations to evaluate the prediction performance of our approach. The results showed that HSSMMA gained the corresponding areas under the receiver operating characteristic (ROC) curve (AUCs) of 0.9913, 0.9902, 0.7989, and 0.9910 ± 0.0004 based on dataset 1 and AUCs of 0.7401, 0.8466, 0.6149, and 0.7451 ± 0.0054 based on dataset 2, respectively. In case studies, 2 of the top 10 and 13 of the top 50 predicted potential miRNA-SM associations were confirmed by published literature. We further implemented case studies to test whether HSSMMA was effective for new SMs without any known related miRNAs. The results from cross validation and case studies showed that HSSMMA could be a useful prediction tool for the identification of potential miRNA-SM associations. Keywords: microRNA, small molecule, association prediction, HeteSim algorithm, heterogeneous network
- Published
- 2019
- Full Text
- View/download PDF
45. Improved Prediction of Protein-Protein Interactions Using Descriptors Derived From PSSM via Gray Level Co-Occurrence Matrix
- Author
-
Hui-Juan Zhu, Zhu-Hong You, Wei-Lei Shi, Shou-Kun Xu, Tong-Hai Jiang, and Li-Hua Zhuang
- Subjects
Protein-protein interactions ,rotation forest ,position-specific scoring matrix ,gray level co-occurrence matrix ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
A better exploring biological processes, means, and functions demands trusted information about Protein-protein interactions (PPIs). High-throughput technologies have produced a large number of PPIs data for various species, however, they are resource-expensive and often suffer from high error rates. To supplement the limitations of the traditional methods, in this paper, a sequence-based computational method is proposed to insight whether two proteins interact or not. The proposed method divides the novel PPIs prediction process into three stages: first, the position-specific scoring matrices (PSSMs) are produced by incorporating the evolutionary information; second, the 352-dimensional feature vector is constructed for each protein pair; third, effective parameters for the ensemble learning algorithm rotation forest (RF) are selected. In the proposed model, the evolutionary features are extracted from PSSM for each protein without considering any protein annotations. In addition, by using more accurate and diverse classifiers constructed by RF algorithm to avoid yielding coincident errors, one sample incorrectly divided by one classifier will be corrected by another classifier. The proposed method is evaluated in terms of accuracy, precision, sensitivity, and so on using Yeast, Human, and Pylori datasets and finds that its performance is superior to that of the competing methods. Specifically, the average accuracies achieved by the proposed method are 97.06% (Yeast), 98.95% (Human), and 89.69% (H.pylori), which improves the accuracy of PPIs prediction by 0.54%~3.89% (Yeast), 1.29%~3.85% (Human), and 0.22%~4.85% (H.pylori). The experimental results prove that the proposed method is an effective alternative approach for predicting novel PPIs.
- Published
- 2019
- Full Text
- View/download PDF
46. CGMDA: An Approach to Predict and Validate MicroRNA-Disease Associations by Utilizing Chaos Game Representation and LightGBM
- Author
-
Kai Zheng, Lei Wang, and Zhu-Hong You
- Subjects
miRNAs ,chaos game representation ,disease ,heterogenous information ,LightGBM ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Recent studies have shown that microRNAs (miRNAs) play an important role in complex human diseases. Identifying potential miRNA-disease associations is useful for understanding the pathogenesis. However, there are currently only a few methods proposed to predict miRNA-disease association based on sequence information. And these methods can only quantify nonlinear sequence relationships without taking linear sequence information into account. In this work, we designed a computational method for predicting miRNA-disease association based on chaos game representation, called CGMDA, to overcome these problems. CGMDA combines association information with miRNA sequence information, miRNA functional information and disease semantic information to improve prediction accuracy. In particular, we use chaos game representation (CGR) technology for the first time to transform miRNA sequence information into image information and extract its features. In the cross-validation experiment, CGMDA achieved a mean the area under the receiver operating characteristic curve (AUC) of 0.9099 on the HMDD v3.0 data set. To better evaluate the performance of CGMDA, we compared it to different classifiers and related prediction methods. In addition, CGMDA is applied to three human complex diseases. The results showed that of the top 40 disease-related miRNAs predicted, 39 (Breast Neoplasm), 39 (Lymphoma) and 38 (Colon Neoplasm) were validated by experiments in case studies. These experimental results show that CGMDA is a reliable tool and has potential application prospects in assisting early diagnosis and treatment of prognosis.
- Published
- 2019
- Full Text
- View/download PDF
47. SGCNCMI: A New Model Combining Multi-Modal Information to Predict circRNA-Related miRNAs, Diseases and Genes
- Author
-
Chang-Qing Yu, Xin-Fei Wang, Li-Ping Li, Zhu-Hong You, Wen-Zhun Huang, Yue-Chao Li, Zhong-Hao Ren, and Yong-Jian Guan
- Subjects
circRNA–miRNA interaction ,circRNA–cancer ,graph convolution network ,miRNA ,k-mer ,Biology (General) ,QH301-705.5 - Abstract
Computational prediction of miRNAs, diseases, and genes associated with circRNAs has important implications for circRNA research, as well as provides a reference for wet experiments to save costs and time. In this study, SGCNCMI, a computational model combining multimodal information and graph convolutional neural networks, combines node similarity to form node information and then predicts associated nodes using GCN with a distributive contribution mechanism. The model can be used not only to predict the molecular level of circRNA–miRNA interactions but also to predict circRNA–cancer and circRNA–gene associations. The AUCs of circRNA—miRNA, circRNA–disease, and circRNA–gene associations in the five-fold cross-validation experiment of SGCNCMI is 89.42%, 84.18%, and 82.44%, respectively. SGCNCMI is one of the few models in this field and achieved the best results. In addition, in our case study, six of the top ten relationship pairs with the highest prediction scores were verified in PubMed.
- Published
- 2022
- Full Text
- View/download PDF
48. DANE-MDA: Predicting microRNA-disease associations via deep attributed network embedding
- Author
-
Bo-Ya Ji, Zhu-Hong You, Yi Wang, Zheng-Wei Li, and Leon Wong
- Subjects
Computational bioinformatics ,Systems biology ,Cancer ,Science - Abstract
Summary: Predicting the microRNA-disease associations by using computational methods is conductive to the efficiency of costly and laborious traditional bio-experiments. In this study, we propose a computational machine learning-based method (DANE-MDA) that preserves integrated structure and attribute features via deep attributed network embedding to predict potential miRNA-disease associations. Specifically, the integrated features are extracted by using deep stacked auto-encoder on the diverse orders of matrixes containing structure and attribute information and are then trained by using random forest classifier. Under 5-fold cross-validation experiments, DANE-MDA yielded average accuracy, sensitivity, and AUC at 85.59%, 84.23%, and 0.9264 in term of HMDD v3.0 dataset, and 83.21%, 80.39%, and 0.9113 in term of HMDD v2.0 dataset, respectively. Additionally, case studies on breast, colon, and lung neoplasms related disease show that 47, 47, and 46 of the top 50 miRNAs can be predicted and retrieved in the other database.
- Published
- 2021
- Full Text
- View/download PDF
49. MGRL: Predicting Drug-Disease Associations Based on Multi-Graph Representation Learning
- Author
-
Bo-Wei Zhao, Zhu-Hong You, Leon Wong, Ping Zhang, Hao-Yuan Li, and Lei Wang
- Subjects
drug ,disease ,drug repositioning ,multi-graph representation learning ,graph embedding ,Genetics ,QH426-470 - Abstract
Drug repositioning is an application-based solution based on mining existing drugs to find new targets, quickly discovering new drug-disease associations, and reducing the risk of drug discovery in traditional medicine and biology. Therefore, it is of great significance to design a computational model with high efficiency and accuracy. In this paper, we propose a novel computational method MGRL to predict drug-disease associations based on multi-graph representation learning. More specifically, MGRL first uses the graph convolution network to learn the graph representation of drugs and diseases from their self-attributes. Then, the graph embedding algorithm is used to represent the relationships between drugs and diseases. Finally, the two kinds of graph representation learning features were put into the random forest classifier for training. To the best of our knowledge, this is the first work to construct a multi-graph to extract the characteristics of drugs and diseases to predict drug-disease associations. The experiments show that the MGRL can achieve a higher AUC of 0.8506 based on five-fold cross-validation, which is significantly better than other existing methods. Case study results show the reliability of the proposed method, which is of great significance for practical applications.
- Published
- 2021
- Full Text
- View/download PDF
50. An Efficient Computational Model for Large-Scale Prediction of Protein–Protein Interactions Based on Accurate and Scalable Graph Embedding
- Author
-
Xiao-Rui Su, Zhu-Hong You, Lun Hu, Yu-An Huang, Yi Wang, and Hai-Cheng Yi
- Subjects
large-scale ,protein-protein interaction ,GraphZoom ,weighted graph ,graph embedding ,Genetics ,QH426-470 - Abstract
Protein–protein interaction (PPI) is the basis of the whole molecular mechanisms of living cells. Although traditional experiments are able to detect PPIs accurately, they often encounter high cost and require more time. As a result, computational methods have been used to predict PPIs to avoid these problems. Graph structure, as the important and pervasive data carriers, is considered as the most suitable structure to present biomedical entities and relationships. Although graph embedding is the most popular approach for graph representation learning, it usually suffers from high computational and space cost, especially in large-scale graphs. Therefore, developing a framework, which can accelerate graph embedding and improve the accuracy of embedding results, is important to large-scale PPIs prediction. In this paper, we propose a multi-level model LPPI to improve both the quality and speed of large-scale PPIs prediction. Firstly, protein basic information is collected as its attribute, including positional gene sets, motif gene sets, and immunological signatures. Secondly, we construct a weighted graph by using protein attributes to calculate node similarity. Then GraphZoom is used to accelerate the embedding process by reducing the size of the weighted graph. Next, graph embedding methods are used to learn graph topology features from the reconstructed graph. Finally, the linear Logistic Regression (LR) model is used to predict the probability of interactions of two proteins. LPPI achieved a high accuracy of 0.99997 and 0.9979 on the PPI network dataset and GraphSAGE-PPI dataset, respectively. Our further results show that the LPPI is promising for large-scale PPI prediction in both accuracy and efficiency, which is beneficial to other large-scale biomedical molecules interactions detection.
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.