6 results on '"Shuai Zeng"'
Search Results
2. MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction.
- Author
-
Duolin Wang, Shuai Zeng, Chunhui Xu, Wangren Qiu, Yanchun Liang, Joshi, Trupti, and Dong Xu
- Subjects
- *
PHOSPHORYLATION , *KINASES , *PROTEINS , *ORGANIC compounds , *APOENZYMES - Abstract
Motivation: Computational methods for phosphorylation site prediction play important roles in protein function studies and experimental design. Most existing methods are based on feature extraction, which may result in incomplete or biased features. Deep learning as the cutting-edge machine learning method has the ability to automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of phosphorylation site prediction. Results: We present MusiteDeep, the first deep-learning framework for predicting general and kinase-specific phosphorylation sites. MusiteDeep takes raw sequence data as input and uses convolutional neural networks with a novel two-dimensional attention mechanism. It achieves over a 50% relative improvement in the area under the precision-recall curve in general phosphorylation site prediction and obtains competitive results in kinase-specific prediction compared to other wellknown tools on the benchmark data. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
3. EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome.
- Author
-
Shuai Zeng, Jing Yang, Brian Hon-Yin Chung, Yu Lung Lau, and Wanling Yang
- Subjects
- *
SINGLE nucleotide polymorphisms , *HUMAN genome , *AMINO acids , *BIOINFORMATICS , *GENETIC code , *TAXONOMY - Abstract
Background Predicting the functional impact of amino acid substitutions (AAS) caused by nonsynonymous single nucleotide polymorphisms (nsSNPs) is becoming increasingly important as more and more novel variants are being discovered. Bioinformatics analysis is essential to predict potentially causal or contributing AAS to human diseases for further analysis, as for each genome, thousands of rare or private AAS exist and only a very small number of which are related to an underlying disease. Existing algorithms in this field still have high false prediction rate and novel development is needed to take full advantage of vast amount of genomic data. Results Here we report a novel algorithm that features two innovative changes: 1. making better use of sequence conservation information by grouping the homologous protein sequences into six blocks according to evolutionary distances to human and evaluating sequence conservation in each block independently, and 2. including as many such homologous sequences as possible in analyses. Random forests are used to evaluate sequence conservation in each block and to predict potential impact of an AAS on protein function. Testing of this algorithm on a comprehensive dataset showed significant improvement on prediction accuracy upon currently widely-used programs. The algorithm and a web-based application tool implementing it, EFIN (Evaluation of Functional Impact of Nonsynonymous SNPs) were made freely available (http://paed.hku.hk/efin/) to the public. Conclusions Grouping homologous sequences into different blocks according to the evolutionary distance of the species to human and evaluating sequence conservation in each group independently significantly improved prediction accuracy. This approach may help us better understand the roles of genetic variants in human disease and health. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
4. A Distributed Flow Rate Control Algorithm for Networked Agent System with Multiple Coding Rates to Optimize Multimedia Data Transmission.
- Author
-
Shuai Zeng, Lemin Li, and Dan Liao
- Subjects
- *
ALGORITHMS , *MULTIMEDIA communications , *DATA transmission systems , *WIRELESS communications , *MOBILE communication systems , *BANDWIDTHS , *MATHEMATICAL optimization - Abstract
With the development of wireless technologies, mobile communication applies more and more extensively in the various walks of life. The social network of both fixed andmobile users can be seen as networked agent system. At present, kinds of devices and access network technology are widely used. Different users in this networked agent system may need different coding rates multimedia data due to their heterogeneous demand. This paper proposes a distributed flow rate control algorithmto optimizemultimedia data transmission of the networked agent systemwith the coexisting various coding rates. In this proposed algorithm, transmission path and upload bandwidth of different coding rate data between source node, fixed and mobile nodes are appropriately arranged and controlled. On the one hand, this algorithm can provide user nodes with differentiated coding rate data and corresponding flow rate. On the other hand, it makes the different coding rate data and user nodes networked, which realizes the sharing of upload bandwidth of user nodes which require different coding rate data. The study conducts mathematical modeling on the proposed algorithmandcompares the systemthat adopts theproposedalgorithmwith the existingsystembasedon the simulationexperiment andmathematical analysis. The results show that the system that adopts the proposed algorithmachieves higher upload bandwidth utilization of user nodes and lower upload bandwidth consumption of source node. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
5. PGen: large-scale genomic variations analysis workflow and browser in SoyKB.
- Author
-
Yang Liu, Khan, Saad M., Juexin Wang, Rynge, Mats, Yuanxun Zhang, Shuai Zeng, Shiyuan Chen, Maldonado dos Santos, Joao V., Valliyodan, Babu, Calyam, Prasad P., Merchant, Nirav, Nguyen, Henry T., Dong Xu, and Joshi, Trupti
- Subjects
- *
NEXT generation networks , *HIGH performance computing , *SINGLE nucleotide polymorphisms , *DNA copy number variations , *KNOT insertion & deletion algorithms - Abstract
Background: With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. Results: We have developed both a Linux version in GitHub (https://github.com/pegasus-isi/PGen-GenomicVariations- Workflow) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), (http://soykb.org/Pegasus/index.php). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser (http://soykb.org/NGS_Resequence/NGS_index.php) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. Conclusion: PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
6. MUFOLD-DB: a processed protein structure database for protein structure prediction and analysis.
- Author
-
Zhiquan He, Chao Zhang, Yang Xu, Shuai Zeng, Jingfen Zhang, and Dong Xu
- Subjects
- *
PROTEIN structure , *PROTEIN analysis , *COMPUTATIONAL complexity , *ELECTRON density , *DATA management - Abstract
Background: Protein structure data in Protein Data Bank (PDB) are widely used in studies of protein function and evolution and in protein structure prediction. However, there are two main barriers in large-scale usage of PDB data: 1) PDB data are highly redundant in terms of sequence and structure similarity; and 2) many PDB files have issues due to inconsistency of data and standards as well as missing residues, so that automated retrieval and analysis are often difficult. Description: To address these issues, we have created MUFOLD-DB http://mufold.org/mufolddb.php, a web-based database, to collect and process the weekly PDB files thereby providing users with non-redundant, cleaned and partially-predicted structure data. For each of the non-redundant sequences, we annotate the SCOP domain classification and predict structures of missing regions by loop modelling. In addition, evolutional information, secondary structure, disorder region, and processed three-dimensional structure are computed and visualized to help users better understand the protein. Conclusions: MUFOLD-DB integrates processed PDB sequence and structure data and multiple computational results, provides a friendly interface for users to retrieve, browse and download these data, and offers several useful functionalities to facilitate users' data operation. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.