Author: "Xia, Kelin" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Xia, Kelin"' showing total 355 results

Start Over Author "Xia, Kelin"

355 results on '"Xia, Kelin"'

1. TopoQA: a topological deep learning-based approach for protein complex structure interface quality assessment

Author: Han, Bingqing, Zhang, Yipeng, Li, Longlong, Gong, Xinqi, and Xia, Kelin
Subjects: Quantitative Biology - Biomolecules
Abstract: Even with the significant advances of AlphaFold-Multimer (AF-Multimer) and AlphaFold3 (AF3) in protein complex structure prediction, their accuracy is still not comparable with monomer structure prediction. Efficient quality assessment (QA) or estimation of model accuracy (EMA) models that can evaluate the quality of the predicted protein-complexes without knowing their native structures, are of key importance for protein structure generation and model selection. In this paper, we leverage persistent homology (PH) to capture the atomic-level topological information around residues and design a topological deep learning-based QA method, TopoQA, to assess the accuracy of protein complex interfaces. We integrate PH from topological data analysis into graph neural networks (GNNs) to characterize complex higher-order structures that GNNs might overlook, enhancing the learning of the relationship between the topological structure of complex interfaces and quality scores. Our TopoQA model is extensively validated based on the two most-widely used datasets, DBM55-AF2 and HAF2, along with our newly constructed ABAG-AF3 dataset to facilitate comparisons with AF3. For all three datasets, TopoQA outperforms AF-Multimer-based AF2Rank and shows an advantage over AF3 in nearly half of the targets. In particular, in the DBM55-AF2 dataset, a ranking loss of 73.6% lower than AF-Multimer-based AF2Rank is obtained. Further, other than AF-Multimer and AF3, we have also extensively compared with nearly-all the state-of-the-art models (as far as we know), it has been found that our TopoQA can achieve the highest Top 10 Hit-rate on the DBM55-AF2 dataset and the lowest ranking loss on the HAF2 dataset. Ablation experiments show that our topological features significantly improve the model performance. At the same time, our method also provides a new paradigm for protein structure representation learning.
Published: 2024

2. KA-GNN: Kolmogorov-Arnold Graph Neural Networks for Molecular Property Prediction

Author: Li, Longlong, Zhang, Yipeng, Wang, Guanghui, and Xia, Kelin
Subjects: Computer Science - Machine Learning, Quantitative Biology - Quantitative Methods
Abstract: Molecular property prediction is a crucial task in the process of Artificial Intelligence-Driven Drug Discovery (AIDD). The challenge of developing models that surpass traditional non-neural network methods continues to be a vibrant area of research. This paper presents a novel graph neural network model-the Kolmogorov-Arnold Network (KAN)-based Graph Neural Network (KA-GNN), which incorporates Fourier series, specifically designed for molecular property prediction. This model maintains the high interpretability characteristic of KAN methods while being extremely efficient in computational resource usage, making it an ideal choice for deployment in resource-constrained environments. Tested and validated on seven public datasets, KA-GNN has shown significant improvements in property predictions over the existing state-of-the-art (SOTA) benchmarks.
Published: 2024

3. Molecular topological deep learning for polymer property prediction

Author: Shen, Cong, Zhang, Yipeng, Han, Fei, and Xia, Kelin
Subjects: Condensed Matter - Materials Science, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Accurate and efficient prediction of polymer properties is of key importance for polymer design. Traditional experimental tools and density function theory (DFT)-based simulations for polymer property evaluation, are both expensive and time-consuming. Recently, a gigantic amount of graph-based molecular models have emerged and demonstrated huge potential in molecular data analysis. Even with the great progresses, these models tend to ignore the high-order and mutliscale information within the data. In this paper, we develop molecular topological deep learning (Mol-TDL) for polymer property analysis. Our Mol-TDL incorporates both high-order interactions and multiscale properties into topological deep learning architecture. The key idea is to represent polymer molecules as a series of simplicial complices at different scales and build up simplical neural networks accordingly. The aggregated information from different scales provides a more accurate prediction of polymer molecular properties.
Published: 2024

4. Quotient complex (QC)-based machine learning for 2D perovskite design

Author: Hu, Chuan-Shen, Mayengbam, Rishikanta, Xia, Kelin, and Sum, Tze Chien
Subjects: Computer Science - Computational Engineering, Finance, and Science, Mathematics - Algebraic Topology
Abstract: With remarkable stability and exceptional optoelectronic properties, two-dimensional (2D) halide layered perovskites hold immense promise for revolutionizing photovoltaic technology. Presently, inadequate representations have substantially impeded the design and discovery of 2D perovskites. In this context, we introduce a novel computational topology framework termed the quotient complex (QC), which serves as the foundation for the material representation. Our QC-based features are seamlessly integrated with learning models for the advancement of 2D perovskite design. At the heart of this framework lies the quotient complex descriptors (QCDs), representing a quotient variation of simplicial complexes derived from materials unit cell and periodic boundary conditions. Differing from prior material representations, this approach encodes higher-order interactions and periodicity information simultaneously. Based on the well-established New Materials for Solar Energetics (NMSE) databank, our QC-based machine learning models exhibit superior performance against all existing counterparts. This underscores the paramount role of periodicity information in predicting material functionality, while also showcasing the remarkable efficiency of the QC-based model in characterizing materials structural attributes.
Published: 2024

5. Topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction

Author: Tan, Joshua Zhi En, Wee, JunJie, Gong, Xue, and Xia, Kelin
Subjects: Quantitative Biology - Quantitative Methods, Computer Science - Machine Learning, Mathematics - General Topology, Quantitative Biology - Biomolecules
Abstract: Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction. Our Top-ML employs peptide topological features derived from its sequence "connection" information characterized by vector and spectral descriptors. Our Top-ML model has been validated on two widely used AntiCP 2.0 benchmark datasets and has achieved state-of-the-art performance. Our results highlight the potential of leveraging novel topology-based featurization to accelerate the identification of anticancer peptides.
Published: 2024

6. Integration of persistent Laplacian and pre-trained transformer for protein solubility changes upon mutation

Author: Wee, JunJie, Chen, Jiahui, Xia, Kelin, and Wei, Guo-Wei
Subjects: Quantitative Biology - Biomolecules, Mathematics - Algebraic Topology
Abstract: Protein mutations can significantly influence protein solubility, which results in altered protein functions and leads to various diseases. Despite of tremendous effort, machine learning prediction of protein solubility changes upon mutation remains a challenging task as indicated by the poor scores of normalized Correct Prediction Ratio (CPR). Part of the challenge stems from the fact that there is no three-dimensional (3D) structures for the wild-type and mutant proteins. This work integrates persistent Laplacians and pre-trained Transformer for the task. The Transformer, pretrained with hunderds of millions of protein sequences, embeds wild-type and mutant sequences, while persistent Laplacians track the topological invariant change and homotopic shape evolution induced by mutations in 3D protein structures, which are rendered from AlphaFold2. The resulting machine learning model was trained on an extensive data set labeled with three solubility types. Our model outperforms all existing predictive methods and improves the state-of-the-art up to 15%.
Published: 2023

7. Graph Neural Networks with a Distribution of Parametrized Graphs

Author: Lee, See Hian, Ji, Feng, Xia, Kelin, and Tay, Wee Peng
Subjects: Computer Science - Machine Learning
Abstract: Traditionally, graph neural networks have been trained using a single observed graph. However, the observed graph represents only one possible realization. In many applications, the graph may encounter uncertainties, such as having erroneous or missing edges, as well as edge weights that provide little informative value. To address these challenges and capture additional information previously absent in the observed graph, we introduce latent variables to parameterize and generate multiple graphs. We obtain the maximum likelihood estimate of the network parameters in an Expectation-Maximization (EM) framework based on the multiple graphs. Specifically, we iteratively determine the distribution of the graphs using a Markov Chain Monte Carlo (MCMC) method, incorporating the principles of PAC-Bayesian theory. Numerical experiments demonstrate improvements in performance against baseline models on node classification for heterogeneous graphs and graph regression on chemistry datasets.
Published: 2023

8. Curvature-enhanced Graph Convolutional Network for Biomolecular Interaction Prediction

Author: Shen, Cong, Ding, Pingjian, Wee, Junjie, Bi, Jialin, Luo, Jiawei, and Xia, Kelin
Subjects: Quantitative Biology - Quantitative Methods, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Quantitative Biology - Biomolecules
Abstract: Geometric deep learning has demonstrated a great potential in non-Euclidean data analysis. The incorporation of geometric insights into learning architecture is vital to its success. Here we propose a curvature-enhanced graph convolutional network (CGCN) for biomolecular interaction prediction, for the first time. Our CGCN employs Ollivier-Ricci curvature (ORC) to characterize network local structures and to enhance the learning capability of GCNs. More specifically, ORCs are evaluated based on the local topology from node neighborhoods, and further used as weights for the feature aggregation in message-passing procedure. Our CGCN model is extensively validated on fourteen real-world bimolecular interaction networks and a series of simulated data. It has been found that our CGCN can achieve the state-of-the-art results. It outperforms all existing models, as far as we know, in thirteen out of the fourteen real-world datasets and ranks as the second in the rest one. The results from the simulated data show that our CGCN model is superior to the traditional GCN models regardless of the positive-to-negativecurvature ratios, network densities, and network sizes (when larger than 500).
Published: 2023

9. Torsion Graph Neural Networks

Author: Shen, Cong, Liu, Xiang, Luo, Jiawei, and Xia, Kelin
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Geometric deep learning (GDL) models have demonstrated a great potential for the analysis of non-Euclidian data. They are developed to incorporate the geometric and topological information of non-Euclidian data into the end-to-end deep learning architectures. Motivated by the recent success of discrete Ricci curvature in graph neural network (GNNs), we propose TorGNN, an analytic Torsion enhanced Graph Neural Network model. The essential idea is to characterize graph local structures with an analytic torsion based weight formula. Mathematically, analytic torsion is a topological invariant that can distinguish spaces which are homotopy equivalent but not homeomorphic. In our TorGNN, for each edge, a corresponding local simplicial complex is identified, then the analytic torsion (for this local simplicial complex) is calculated, and further used as a weight (for this edge) in message-passing process. Our TorGNN model is validated on link prediction tasks from sixteen different types of networks and node classification tasks from three types of networks. It has been found that our TorGNN can achieve superior performance on both tasks, and outperform various state-of-the-art models. This demonstrates that analytic torsion is a highly efficient topological invariant in the characterization of graph structures and can significantly boost the performance of GNNs.
Published: 2023

10. Molecular geometric deep learning

Author: Shen, Cong, Luo, Jiawei, and Xia, Kelin
Subjects: Physics - Computational Physics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Geometric deep learning (GDL) has demonstrated huge power and enormous potential in molecular data analysis. However, a great challenge still remains for highly efficient molecular representations. Currently, covalent-bond-based molecular graphs are the de facto standard for representing molecular topology at the atomic level. Here we demonstrate, for the first time, that molecular graphs constructed only from non-covalent bonds can achieve similar or even better results than covalent-bond-based models in molecular property prediction. This demonstrates the great potential of novel molecular representations beyond the de facto standard of covalent-bond-based molecular graphs. Based on the finding, we propose molecular geometric deep learning (Mol-GDL). The essential idea is to incorporate a more general molecular representation into GDL models. In our Mol-GDL, molecular topology is modeled as a series of molecular graphs, each focusing on a different scale of atomic interactions. In this way, both covalent interactions and non-covalent interactions are incorporated into the molecular representation on an equal footing. We systematically test Mol-GDL on fourteen commonly-used benchmark datasets. The results show that our Mol-GDL can achieve a better performance than state-of-the-art (SOTA) methods. Source code and data are available at https://github.com/CS-BIO/Mol-GDL.
Published: 2023

11. Geometric data analysis-based machine learning for two-dimensional perovskite design

Author: Hu, Chuan-Shen, Mayengbam, Rishikanta, Wu, Min-Chun, Xia, Kelin, and Sum, Tze Chien
Published: 2024
Full Text: View/download PDF

12. Persistent Dirac for molecular representation

Author: Wee, JunJie, Bianconi, Ginestra, and Xia, Kelin
Subjects: Quantitative Biology - Biomolecules
Abstract: Molecular representations are of fundamental importance for the modeling and analysis of molecular systems. Representation models and in general approaches based on topological data analysis (TDA) have demonstrated great success in various steps of drug design and materials discovery. Here we develop a mathematically rigorous computational framework for molecular representation based on the persistent Dirac operator. The properties of the spectrum of the discrete weighted and unweighted Dirac matrices are systemically discussed and used to demonstrate the geometric and topological properties of both non-homology and homology eigenvectors of real molecular structures. This allows us to asses the influence of weighting schemes on the information encoded in the Dirac eigenspectrum. A series of physical persistent attributes, which characterize the spectrum of the Dirac matrices across a filtration, are proposed and used as efficient molecular fingerprints. Finally, our persistent Dirac-based model is used for clustering molecular configurations from nine types of organic-inorganic halide perovskites. We found that our model can cluster the structures very well, demonstrating the representation and featurization power of the current approach., Comment: 22 pages, 7 figures
Published: 2023

13. Curvature-enhanced graph convolutional network for biomolecular interaction prediction

Author: Shen, Cong, Ding, Pingjian, Wee, Junjie, Bi, Jialin, Luo, Jiawei, and Xia, Kelin
Published: 2024
Full Text: View/download PDF

14. Integration of persistent Laplacian and pre-trained transformer for protein solubility changes upon mutation

Author: Wee, JunJie, Chen, Jiahui, Xia, Kelin, and Wei, Guo-Wei
Published: 2024
Full Text: View/download PDF

15. A Unified Topological Approach to Data Science

Author: Grbić, Jelena, Wu, Jie, Xia, Kelin, and Wei, Guo-Wei
Subjects: Mathematics - Algebraic Topology, 55N31 (Primary) 62R40, 68T09, 05C65, 55U10 (Secondary)
Abstract: We establish a new theory which gives a unified topological approach to data science, by being applicable both to point cloud data and to graph data, including networks beyond pairwise interactions. We generalize simplicial complexes and hypergraphs to super-hypergraphs and establish super-hypergraph homology as an extension of simplicial homology. Driven by applications, we also introduce super-persistent homology., Comment: 46 pages, 8 fingures
Published: 2021

16. Ollivier persistent Ricci curvature (OPRC) based molecular representation for drug design

Author: Wee, JunJie and Xia, Kelin
Subjects: Quantitative Biology - Biomolecules
Abstract: Efficient molecular featurization is one of the major issues for machine learning models in drug design. Here we propose persistent Ricci curvature (PRC), in particular Ollivier persistent Ricci curvature (OPRC), for the molecular featurization and feature engineering, for the first time. Filtration process proposed in persistent homology is employed to generate a series of nested molecular graphs. Persistence and variation of Ollivier Ricci curvatures on these nested graphs are defined as Ollivier persistent Ricci curvature. Moreover, persistent attributes, which are statistical and combinatorial properties of OPRCs during the filtration process, are used as molecular descriptors, and further combined with machine learning models, in particular, gradient boosting tree (GBT). Our OPRC-GBT model is used in the prediction of protein-ligand binding affinity, which is one of key steps in drug design. Based on three most-commonly used datasets from the well-established protein-ligand binding databank, i.e., PDBbind, we intensively test our model and compare with existing models. It has been found that our model are better than all machine learning models with traditional molecular descriptors., Comment: 38 pages, 9 figures
Published: 2020

17. Persistent spectral based machine learning (PerSpect ML) for drug design

Author: Meng, Zhenyu and Xia, Kelin
Subjects: Quantitative Biology - Quantitative Methods, Computer Science - Machine Learning, Mathematics - Algebraic Topology
Abstract: In this paper, we propose persistent spectral based machine learning (PerSpect ML) models for drug design. Persistent spectral models, including persistent spectral graph, persistent spectral simplicial complex and persistent spectral hypergraph, are proposed based on spectral graph theory, spectral simplicial complex theory and spectral hypergraph theory, respectively. Different from all previous spectral models, a filtration process, as proposed in persistent homology, is introduced to generate multiscale spectral models. More specifically, from the filtration process, a series of nested topological representations, i,e., graphs, simplicial complexes, and hypergraphs, can be systematically generated and their spectral information can be obtained. Persistent spectral variables are defined as the function of spectral variables over the filtration value. Mathematically, persistent multiplicity (of zero eigenvalues) is exactly the persistent Betti number (or Betti curve). We consider 11 persistent spectral variables and use them as the feature for machine learning models in protein-ligand binding affinity prediction. We systematically test our models on three most commonly-used databases, including PDBbind-2007, PDBbind-2013 and PDBbind-2016. Our results, for all these databases, are better than all existing models, as far as we know. This demonstrates the great power of our PerSpect ML in molecular data analysis and drug design., Comment: 17 pages; 8 Figures; 3 tables
Published: 2020

18. Weighted persistent homology for osmolyte molecular aggregation and hydrogen-bonding network analysis

Author: Anand, D Vijay, Xia, Kelin, and Mu, Yuguang
Subjects: Quantitative Biology - Quantitative Methods, Quantitative Biology - Biomolecules
Abstract: It has long been observed that trimethylamin N-oxide (TMAO) and urea demonstrate dramatically different properties in a protein folding process. Even with the enormous theoretical and experimental research work of the two osmolytes, various aspects of their underlying mechanisms still remain largely elusive. In this paper, we propose to use the weighted persistent homology to systematically study the osmolytes molecular aggregation and their hydrogen-bonding network from a local topological perspective. We consider two weighted models, i.e., localized persistent homology (LPH) and interactive persistent homology (IPH). From the localized persistent homology models, we have found that TMAO and urea have very different local topology. TMAO shows local network structures. With the concentration increase, the circle elements in these networks show a clear increase in their total numbers and a decrease in their relative sizes. In contrast, urea shows two types of local topological patterns, i.e., local clusters around 6 \AA~ and a few global circle elements at around 12 \AA. From the interactive persistent homology models, it has been found that our persistent radial distribution function (PRDF) from the global-scale IPH has same physical properties as the traditional radial distribution function (RDF). Moreover, PRDFs from the local-scale IPH can also be generated and used to characterize the local interaction information. Other than the clear difference of the first peak value of PRDFs at filtration size 4\AA, TMAO and urea also shows very different behaviors at the second peak region from filtration size 5\AA~ to 10 \AA., Comment: 19 pages,9 figures
Published: 2019

19. Persistent homology analysis of osmolyte molecular aggregation and their hydrogen-bonding networks

Author: Xia, Kelin, Anand, D Vijay, Saxena, Shikhar, and Mu, Yuguang
Subjects: Quantitative Biology - Quantitative Methods, Quantitative Biology - Biomolecules
Abstract: Two types of osmolytes, i.e., trimethylamin N-oxide (TMAO) and urea, demonstrate dramatically different properties in a protein folding process. Even with the great progresses in revealing the potential underlying mechanism of these two osmolyte systems, many problems still remain unsolved. In this paper, we propose to use the persistent homology, a newly-invented topological method, to systematically study the osmolytes molecular aggregation and their hydrogen-bonding network from a global topological perspective. It has been found that, for the first time, TMAO and urea show two extremely different topological behaviors, i.e., extensive network and local cluster. In general, TMAO forms highly consistent large loop or circle structures in high concentrations. In contrast, urea is more tightly aggregated locally. Moreover, the resulting hydrogen-bonding networks also demonstrate distinguishable features. With the concentration increase, TMAO hydrogen-bonding networks vary greatly in their total number of loop structures and large-sized loop structures consistently increase. In contrast, urea hydrogen-bonding networks remain relatively stable with slight reduce of the total loop number. Moreover, the persistent entropy (PE) is, for the first time, used in characterization of the topological information of the aggregation and hydrogen-bonding networks. The average PE systematically increases with the concentration for both TMAO and urea, and decreases in their hydrogen-bonding networks. But their PE variances have totally different behaviors. Finally, topological features of the hydrogen-bonding networks are found to be highly consistent with those from the ion aggregation systems, indicating that our topological invariants can characterize intrinsic features of the "structure making" and "structure breaking" systems., Comment: 19 pages; 9 figures; 1 table
Published: 2019
Full Text: View/download PDF

20. Weighted persistent homology for biomolecular data analysis

Author: Meng, Zhenyu, Anand, D Vijay, Lu, Yunpeng, Wu, Jie, and Xia, Kelin
Subjects: Quantitative Biology - Biomolecules
Abstract: In this paper, we systematically review weighted persistent homology (WPH) models and their applications in biomolecular data analysis. Essentially, the weight value, which reflects physical, chemical and biological properties, can be assigned to vertices (atom centers), edges (bonds), or higher order simplexes (cluster of atoms), depending on the biomolecular structure, function, and dynamics properties. Further, we propose the first localized weighted persistent homology (LWPH). Inspired by the great success of element specific persistent homology (ESPH), we do not treat biomolecules as an inseparable system like all previous weighted models, instead we decompose them into a series of local domains, which may be overlapped with each other. The general persistent homology or weighted persistent homology analysis is then applied on each of these local domains. In this way, functional properties, that are embedded in local structures, can be revealed. Our model has been applied to systematically studying DNA structures. It has been found that our LWPH based features can be used to successfully discriminate the A-, B-, and Z-types of DNA. More importantly, our LWPH based PCA model can identify two configurational states of DNA structure in ion liquid environment, which can be revealed only by the complicated helical coordinate system. The great consistence with the helical-coordinate model demonstrates that our model captures local structure variations so well that it is comparable with geometric models. Moreover, geometric measurements are usually defined in very local regions. For instance, the helical-coordinate system is limited to one or two basepairs. However, our LWPH can quantitatively characterize structure information in local regions or domains with arbitrary sizes and shapes, where traditional geometrical measurements fail., Comment: 27 pages; 18 figures
Published: 2019

21. Discrete Morse Theory for Weighted Simplicial Complexes

Author: Wu, Chengyuan, Ren, Shiquan, Wu, Jie, and Xia, Kelin
Subjects: Mathematics - Algebraic Topology
Abstract: In this paper, we study Forman's discrete Morse theory in the context of weighted homology. We develop weighted versions of classical theorems in discrete Morse theory. A key difference in the weighted case is that simplicial collapses do not necessarily preserve weighted homology. We work out some sufficient conditions for collapses to preserve weighted homology, as well as study the effect of elementary removals on weighted homology. An application to sequence analysis is included, where we study the weighted ordered complexes of sequences., Comment: 19 pages, to appear in Topology and its Applications
Published: 2019
Full Text: View/download PDF

22. Persistent-Homology-based Machine Learning and its Applications -- A Survey

Author: Pun, Chi Seng, Xia, Kelin, and Lee, Si Xian
Subjects: Mathematics - Algebraic Topology
Abstract: A suitable feature representation that can both preserve the data intrinsic information and reduce data complexity and dimensionality is key to the performance of machine learning models. Deeply rooted in algebraic topology, persistent homology (PH) provides a delicate balance between data simplification and intrinsic structure characterization, and has been applied to various areas successfully. However, the combination of PH and machine learning has been hindered greatly by three challenges, namely topological representation of data, PH-based distance measurements or metrics, and PH-based feature representation. With the development of topological data analysis, progresses have been made on all these three problems, but widely scattered in different literatures. In this paper, we provide a systematical review of PH and PH-based supervised and unsupervised models from a computational perspective. Our emphasizes are the recent development of mathematical models and tools, including PH softwares and PH-based functions, feature representations, kernels, and similarity models. Essentially, this paper can work as a roadmap for the practical application of PH-based machine learning tools. Further, we consider different topological feature representations in different machine learning models, and investigate their impacts on the protein secondary structure classification., Comment: 42 pages; 6 figures; 9 tables
Published: 2018

23. Persistent-homology-based machine learning: a survey and a comparative study

Author: Pun, Chi Seng, Lee, Si Xian, and Xia, Kelin
Published: 2022
Full Text: View/download PDF

24. Weighted Fundamental Group

Author: Wu, Chengyuan, Ren, Shiquan, Wu, Jie, and Xia, Kelin
Subjects: Mathematics - Algebraic Topology
Abstract: In this paper, we develop and study the theory of weighted fundamental groups of weighted simplicial complexes. When all weights are 1, the weighted fundamental group reduces to the usual fundamental group as a special case. We also study weighted versions of classical theorems like van Kampen's theorem. In addition, we also investigate the abelianization, lower central series and applications of weighted fundamental groups., Comment: 20 pages
Published: 2018

25. Weighted (Co)homology and Weighted Laplacian

Author: Wu, Chengyuan, Ren, Shiquan, Wu, Jie, and Xia, Kelin
Subjects: Mathematics - Algebraic Topology
Abstract: In this paper, we generalize the combinatorial Laplace operator of Horak and Jost by introducing the $\phi$-weighted coboundary operator induced by a weight function $\phi$. Our weight function $\phi$ is a generalization of Dawson's weighted boundary map. We show that our above-mentioned generalizations include new cases that are not covered by previous literature. Our definition of weighted Laplacian for weighted simplicial complexes is also applicable to weighted/unweighted graphs and digraphs., Comment: 24 pages
Published: 2018

26. Persistent homology analysis of ion aggregation and hydrogen-bonding network

Author: Xia, Kelin
Subjects: Quantitative Biology - Quantitative Methods, Quantitative Biology - Biomolecules
Abstract: Despite the great advancement of experimental tools and theoretical models, a quantitative characterization of the microscopic structures of ion aggregates and its associated water hydrogen-bonding networks still remains a challenging problem. In this paper, a newly-invented mathematical method called persistent homology is introduced, for the first time, to quantitatively analyze the intrinsic topological properties of ion aggregation systems and hydrogen-bonding networks. Two most distinguishable properties of persistent homology analysis of assembly systems are as follows. First, it does not require a predefined bond length to construct the ion or hydrogen network. Persistent homology results are determined by the morphological structure of the data only. Second, it can directly measure the size of circles or holes in ion aggregates and hydrogen-bonding networks. To validate our model, we consider two well-studied systems, i.e., NaCl and KSCN solutions, generated from molecular dynamics simulations. They are believed to represent two morphological types of aggregation, i.e., local clusters and extended ion network. It has been found that the two aggregation types have distinguishable topological features and can be characterized by our topological model very well. For hydrogen-bonding networks, KSCN systems demonstrate much more dramatic variations in their local circle structures with the concentration increase. A consistent increase of large-sized local circle structures is observed and the sizes of these circles become more and more diverse. In contrast, NaCl systems show no obvious increase of large-sized circles. Instead a consistent decline of the average size of circle structures is observed and the sizes of these circles become more and more uniformed with the concentration increase., Comment: 21 pages, 11 figures, 2 tables
Published: 2018
Full Text: View/download PDF

27. Topological feature engineering for machine learning based halide perovskite materials design

Author: Anand, D. Vijay, Xu, Qiang, Wee, JunJie, Xia, Kelin, and Sum, Tze Chien
Published: 2022
Full Text: View/download PDF

28. Hodge theory-based biomolecular data analysis

Author: Wei, Ronald Koh Joon, Wee, Junjie, Laurent, Valerie Evangelin, and Xia, Kelin
Published: 2022
Full Text: View/download PDF

29. Multiscale virtual particle based elastic network model (MVP-ENM) for biomolecular normal mode analysis

Author: Xia, Kelin
Subjects: Quantitative Biology - Biomolecules
Abstract: In this paper, a multiscale virtual particle based elastic network model (MVP-ENM) is proposed for biomolecular normal mode analysis. The multiscale virtual particle model is proposed for the discretization of biomolecular density data in different scales. Essentially, the model works as the coarse-graining of the biomolecular structure, so that a delicate balance between biomolecular geometric representation and computational cost can be achieved. To form "connections" between these multiscale virtual particles, a new harmonic potential function, which considers the influence from both mass distributions and distance relations, is adopted between any two virtual particles. Unlike the previous ENMs that use a constant spring constant, a particle-dependent spring parameter is used in MVP-ENM. Two independent models, i.e., multiscale virtual particle based Gaussian network model (MVP-GNM) and multiscale virtual particle based anisotropic network model (MVP-ANM), are proposed. Even with a rather coarse grid and a low resolution, the MVP-GNM is able to predict the Debye-Waller factors (B-factors) with considerable good accuracy. Similar properties have also been observed in MVP-ANM. More importantly, in B-factor predictions, the mismatch between the predicted results and experimental ones is predominantly from higher fluctuation regions. Further, it is found that MVP-ANM can deliver a very consistent low-frequency eigenmodes in various scales. This demonstrates the great potential of MVP-ANM in the deformation analysis of low resolution data. With the multiscale rigidity function, the MVP-ENM can be applied to biomolecular data represented in density distribution and atomic coordinates. Further, the great advantage of my MVP-ENM model in computational cost has been demonstrated by using two poliovirus virus structures. Finally, the paper ends with a conclusion., Comment: 15 figures; 25 pages
Published: 2017
Full Text: View/download PDF

30. A quantitative structure comparison with persistent similarity

Author: Xia, Kelin
Subjects: Quantitative Biology - Quantitative Methods, Quantitative Biology - Biomolecules
Abstract: Biomolecular structure comparison not only reveals evolutionary relationships, but also sheds light on biological functional properties. However, traditional definitions of structure or sequence similarity always involve superposition or alignment and are computationally inefficient. In this paper, I propose a new method called persistent similarity, which is based on a newly-invented method in algebraic topology, known as persistent homology. Different from all previous topological methods, persistent homology is able to embed a geometric measurement into topological invariants, thus provides a bridge between geometry and topology. Further, with the proposed persistent Betti function (PBF), topological information derived from the persistent homology analysis can be uniquely represented by a series of continuous one-dimensional (1D) functions. In this way, any complicated biomolecular structure can be reduced to several simple 1D PBFs for comparison. Persistent similarity is then defined as the quotient of sizes of intersect areas and union areas between two correspondingly PBFs. If structures have no significant topological properties, a pseudo-barcode is introduced to insure a better comparison. Moreover, a multiscale biomolecular representation is introduced through the multiscale rigidity function. It naturally induces a multiscale persistent similarity. The multiscale persistent similarity enables an objective-oriented comparison. State differently, it facilitates the comparison of structures in any particular scale of interest. Finally, the proposed method is validated by four different cases. It is found that the persistent similarity can be used to describe the intrinsic similarities and differences between the structures very well., Comment: 20 PAGES, 13 PICTURES
Published: 2017

31. Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis

Author: Xia, Kelin
Subjects: Quantitative Biology - Quantitative Methods
Abstract: In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the biomolecular data analysis. With the combination of spectral graph method, I reveal the essential difference between the global scale models and local scale ones in structure clustering, i.e., different optimization on Euclidean (or spatial) distances and sequential (or genomic) distances. More specifically, clusters from global scale models optimize Euclidean distance relations. Local scale models, on the other hand, result in clusters that optimize the genomic distance relations. For a biomolecular data, Euclidean distances and sequential distances are two independent variables, which can never be optimized simultaneously in data clustering. However, sequence scale in my SeqMM can work as a tuning parameter that balances these two variables and deliver different clusterings based on my purposes. Further, my SeqMM is used to explore the hierarchical structures of chromosomes. I find that in global scale, the Fiedler vector from my SeqMM bears a great similarity with the principal vector from principal component analysis, and can be used to study genomic compartments. In TAD analysis, I find that TADs evaluated from different scales are not consistent and vary a lot. Particularly when the sequence scale is small, the calculated TAD boundaries are dramatically different. Even for regions with high contact frequencies, TAD regions show no obvious consistence. However, when the scale value increases further, although TADs are still quite different, TAD boundaries in these high contact frequency regions become more and more consistent. Finally, I find that for a fixed local scale, my method can deliver very robust TAD boundaries in different cluster numbers., Comment: 22 PAGES, 13 FIGURES
Published: 2017
Full Text: View/download PDF

32. Multiscale persistent functions for biomolecular structure characterization

Author: Xia, Kelin, Li, Zhiming, and Mu, Lin
Subjects: Quantitative Biology - Biomolecules
Abstract: In this paper, we introduce multiscale persistent functions for biomolecular structure characterization. The essential idea is to combine our multiscale rigidity functions with persistent homology analysis, so as to construct a series of multiscale persistent functions, particularly multiscale persistent entropies, for structure characterization. To clarify the fundamental idea of our method, the multiscale persistent entropy model is discussed in great detail. Mathematically, unlike the previous persistent entropy or topological entropy, a special resolution parameter is incorporated into our model. Various scales can be achieved by tuning its value. Physically, our multiscale persistent entropy can be used in conformation entropy evaluation. More specifically, it is found that our method incorporates in it a natural classification scheme. This is achieved through a density filtration of a multiscale rigidity function built from bond and/or dihedral angle distributions. To further validate our model, a systematical comparison with the traditional entropy evaluation model is done. It is found that our model is able to preserve the intrinsic topological features of biomolecular data much better than traditional approaches, particularly for resolutions in the mediate range. Moreover, our method can be successfully used in protein classification. For a test database with around nine hundred proteins, a clear separation between all-alpha and all-beta proteins can be achieved, using only the dihedral and pseudo-bond angle information. Finally, a special protein structure index (PSI) is proposed, for the first time, to describe the "regularity" of protein structures. Essentially, PSI can be used to describe the "regularity" information in any systems., Comment: 10 figures and 1 table
Published: 2016

33. A review of geometric, topological and graph theory apparatuses for the modeling and analysis of biomolecular data

Author: Xia, Kelin and Wei, Guo-Wei
Subjects: Quantitative Biology - Biomolecules, Mathematics - Algebraic Topology
Abstract: Geometric, topological and graph theory modeling and analysis of biomolecules are of essential importance in the conceptualization of molecular structure, function, dynamics, and transport. On the one hand, geometric modeling provides molecular surface and structural representation, and offers the basis for molecular visualization, which is crucial for the understanding of molecular structure and interactions. On the other hand, it bridges the gap between molecular structural data and theoretical/mathematical models. Topological analysis and modeling give rise to atomic critical points and connectivity, and shed light on the intrinsic topological invariants such as independent components (atoms), rings (pockets) and cavities. Graph theory analyzes biomolecular interactions and reveals biomolecular structure-function relationship. In this paper, we review certain geometric, topological and graph theory apparatuses for biomolecular data modeling and analysis. These apparatuses are categorized into discrete and continuous ones. For discrete approaches, graph theory, Gaussian network model, anisotropic network model, normal mode analysis, quasi-harmonic analysis, flexibility and rigidity index, molecular nonlinear dynamics, spectral graph theory, and persistent homology are discussed. For continuous mathematical tools, we present discrete to continuum mapping, high dimensional persistent homology, biomolecular geometric modeling, differential geometry theory of surfaces, curvature evaluation, variational derivation of minimal molecular surfaces, atoms in molecule theory and quantum chemical topology. Four new approaches, including analytical minimal molecular surface, Hessian matrix eigenvalue map, curvature map and virtual particle model, are introduced for the first time to bridge the gaps in biomolecular modeling and analysis., Comment: 76 pages,33 figures
Published: 2016

34. Neighborhood Complex Based Machine Learning (NCML) Models for Drug Design

Author: Liu, Xiang, Xia, Kelin, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Reyes, Mauricio, editor, Henriques Abreu, Pedro, editor, Cardoso, Jaime, editor, Hajij, Mustafa, editor, Zamzmi, Ghada, editor, Rahul, Paul, editor, and Thakur, Lokendra, editor
Published: 2021
Full Text: View/download PDF

35. Flexibility and rigidity index for chromosome packing, flexibility and dynamics analysis

Author: Peng, Jiajie, Yang, Jinjin, Anand, D. Vijay, Shang, Xuequn, and Xia, Kelin
Published: 2022
Full Text: View/download PDF

36. Generalized flexibility-rigidity index

Author: Nguyen, Duc Duy, Xia, Kelin, and Wei, Guo-Wei
Subjects: Quantitative Biology - Biomolecules
Abstract: Flexibility-rigidity index (FRI) has been developed as a robust, accurate and efficient method for macromolecular thermal fluctuation analysis and B-factor prediction. The performance of FRI depends on its formulations of rigidity index and flexibility index. In this work, we introduce alternative rigidity and flexibility formulations. The structure of the classic Gaussian surface is utilized to construct a new type of rigidity index, which leads to a new class of rigidity densities with the classic Gaussian surface as a special case. Additionally, we introduce a new type of flexibility index based on the domain indicator property of normalized rigidity density. These generalized FRI (gFRI) methods have been extensively validated by the B-factor predictions of 364 proteins. Significantly outperforming the classic Gaussian network model (GNM), gFRI is a new generation of methodologies for accurate, robust and efficient analysis of protein flexibility and fluctuation. Finally, gFRI based molecular surface generation and flexibility visualization are demonstrated., Comment: 12 pages, 4 figures
Published: 2016
Full Text: View/download PDF

37. Mathematical-based microbiome analytics for clinical translation

Author: Narayana, Jayanth Kumar, Mac Aogáin, Micheál, Goh, Wilson Wen Bin, Xia, Kelin, Tsaneva-Atanasova, Krasimira, and Chotirmall, Sanjay H.
Published: 2021
Full Text: View/download PDF

38. Flexibility-Rigidity Index for Protein-Nucleic Acid Flexibility and Fluctuation Analysis

Author: Opron, Kristopher, Xia, Kelin, Burton, Zachary F., and Wei, Guo-Wei
Subjects: Quantitative Biology - Biomolecules
Abstract: Protein-nucleic acid complexes are important for many cellular processes including the most essential function such as transcription and translation. For many protein-nucleic acid complexes, flexibility of both macromolecules has been shown to be critical for specificity and/or function. Flexibility-rigidity index (FRI) has been proposed as an accurate and efficient approach for protein flexibility analysis. In this work, we introduce FRI for the flexibility analysis of protein-nucleic acid complexes. We demonstrate that a multiscale strategy, which incorporates multiple kernels to capture various length scales in biomolecular collective motions, is able to significantly improve the state of art in the flexibility analysis of protein-nucleic acid complexes. We take the advantage of the high accuracy and ${\cal O}(N)$ computational complexity of our multiscale FRI method to investigate the flexibility of large ribosomal subunits, which is difficult to analyze by alternative approaches. An anisotropic FRI approach, which involves localized Hessian matrices, is utilized to study the translocation dynamics in an RNA polymerase., Comment: 19 pages, 5 figures
Published: 2015

39. Multiscale Gaussian network model (mGNM) and multiscale anisotropic network model (mANM)

Author: Xia, Kelin, Opron, Kristopher, and Wei, Guo-Wei
Subjects: Quantitative Biology - Biomolecules
Abstract: Gaussian network model(GNM) and anisotropic network model(ANM) are some of the most popular methods for the study of protein flexibility and related functions. In this work, we propose generalized GNM(gGNM) and ANM methods and show that the GNM Kirchhoff matrix can be built from the ideal low-pass filter, which is a special case of a wide class of correlation functions underpinning the linear scaling flexibility-rigidity index(FRI) method. Based on the mathematical structure of correlation functions, we propose a unified framework to construct generalized Kirchhoff matrices whose matrix inverse leads to gGNMs, whereas, the direct inverse of its diagonal elements gives rise to FRI method.With this connection,we further introduce two multiscale elastic network models, namely, multiscale GNM(mGNM) and multiscale ANM(mANM), which are able to incorporate different scales into the generalized Kirchkoff matrices or generalized Hessian matrices.We validate our new multiscale methods with extensive numerical experiments. We illustrate that gGNMs outperform the original GNM method in the B-factor prediction of a set of 364 proteins.We demonstrate that for a given correlation function, FRI and gGNM methods provide essentially identical B-factor predictions when the scale value in the correlation function is sufficiently large.More importantly,we reveal intrinsic multiscale behavior in protein structures. The proposed mGNM and mANM are able to capture this multiscale behavior and thus give rise to a significant improvement of more than 11% in B-factor predictions over the original GNM and ANM methods. We further demonstrate benefit of our mGNM in the B-factor predictions on many proteins that fail the original GNM method. We show that the present mGNM can also be used to analyze protein domain separations. Finally, we showcase the ability of our mANM for the simulation of protein collective motions., Comment: 21 pages,16 figures
Published: 2015
Full Text: View/download PDF

40. A topological approach for protein classification

Author: Cang, Zixuan, Mu, Lin, Wu, Kedi, Opron, Kristopher, Xia, Kelin, and Wei, Guo-Wei
Subjects: Quantitative Biology - Biomolecules
Abstract: Protein function and dynamics are closely related to its sequence and structure. However prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity be- tween proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics. Persistent homology is a new branch of algebraic topology that has found its success in the topological data analysis in a variety of disciplines, including molecular biology. The present work explores the potential of using persistent homology as an indepen- dent tool for protein classification. To this end, we propose a molecular topological fingerprint based support vector machine (MTF-SVM) classifier. Specifically, we construct machine learning feature vectors solely from protein topological fingerprints, which are topological invariants generated during the filtration process. To validate the present MTF-SVM approach, we consider four types of problems. First, we study protein-drug binding by using the M2 channel protein of influenza A virus. We achieve 96% accuracy in discriminating drug bound and unbound M2 channels. Additionally, we examine the use of MTF-SVM for the classification of hemoglobin molecules in their relaxed and taut forms and obtain about 80% accuracy. The identification of all alpha, all beta, and alpha-beta protein domains is carried out in our next study using 900 proteins. We have found a 85% success in this identifica- tion. Finally, we apply the present technique to 55 classification tasks of protein superfamilies over 1357 samples. An average accuracy of 82% is attained. The present study establishes computational topology as an independent and effective alternative for protein classification.
Published: 2015

41. Finite Volume Formulation of the MIB Method for Elliptic Interface Problems

Author: Cao, Yin, Wang, Bao, Xia, Kelin, and Wei, Guowi
Subjects: Mathematics - Numerical Analysis, 65
Abstract: The matched interface and boundary (MIB) method has a proven ability for delivering the second order accuracy in handling elliptic interface problems with arbitrarily complex interface geometries. However, its collocation formulation requires relatively high solution regularity. Finite volume method (FVM) has its merit in dealing with conservation law problems and its integral formulation works well with relatively low solution regularity. We propose an MIB-FVM to take the advantages of both MIB and FVM for solving elliptic interface problems. We construct the proposed method on Cartesian meshes with vertex-centered control volumes. A large number of numerical experiments are designed to validate the present method in both two dimensional (2D) and three dimensional (3D) domains. It is found that the proposed MIB-FVM achieves the second order convergence for elliptic interface problems with complex interface geometries in both $L_{\infty}$ and $L_2$ norms., Comment: 26 pages, 17 figures
Published: 2015

42. Capturing protein multiscale thermal fluctuations

Author: Opron, Kristopher, Xia, Kelin, and Wei, Guo-Wei
Subjects: Quantitative Biology - Biomolecules
Abstract: Existing elastic network models are typically parametrized at a given cutoff distance and often fail to properly predict the thermal fluctuation of many macromolecules that involve multiple characteristic length scales. We introduce a multiscale flexibility-rigidity index (mFRI) method to resolve this problem. The proposed mFRI utilizes two or three correlation kernels parametrized at different length scales to capture protein interactions at corresponding scales. It is about 20% more accurate than the Gaussian network model (GNM) in the B-factor prediction of a set of 364 proteins. Additionally, the present method is able to delivery accurate predictions for multiscale macromolecules that fail GNM. Finally, or a protein of $N$ residues, mFRI is of linear scaling (O(N)) in computational complexity, in contrast to the order of O(N^3) for GNM., Comment: 16 pages, 8 figures
Published: 2015

43. Multiresolution topological simplification

Author: Xia, Kelin, Zhao, Zhixiong, and Wei, Guo-Wei
Subjects: Quantitative Biology - Biomolecules
Abstract: Persistent homology has been devised as a promising tool for the topological simplification of complex data. However, it is computationally intractable for large data sets. In this work, we introduce multiresolution persistent homology for tackling large data sets. Our basic idea is to match the resolution with the scale of interest so as to create a topological microscopy for the underlying data. We utilize flexibility-rigidity index (FRI) to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution, we are able to focus the topological lens on a desirable scale. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA and RNA molecules. In particular, the topological persistence of a virus capsid with 240 protein monomers is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks and graphs., Comment: 22 pages and 14 figures
Published: 2015

44. Atomic Scale Design and Three-Dimensional Simulation of Ionic Diffusive Nanofluidic Channels

Author: Park, Jin Kyoung, Xia, Kelin, and We, Guo-Wei
Subjects: Quantitative Biology - Quantitative Methods, Condensed Matter - Soft Condensed Matter, Physics - Chemical Physics
Abstract: Recent advance in nanotechnology has led to rapid advances in nanofluidics, which has been established as a reliable means for a wide variety of applications, including molecular separation, detection, crystallization and biosynthesis. Although atomic and molecular level consideration is a key ingredient in experimental design and fabrication of nanfluidic systems, atomic and molecular modeling of nanofluidics is rare and most simulations at nanoscale are restricted to one- or two-dimensions in the literature, to our best knowledge. The present work introduces atomic scale design and three-dimensional (3D) simulation of ionic diffusive nanofluidic systems. We propose a variational multiscale framework to represent the nanochannel in discrete atomic and/or molecular detail while describe the ionic solution by continuum. Apart from the major electrostatic and entropic effects, the non-electrostatic interactions between the channel and solution, and among solvent molecules are accounted in our modeling. We derive generalized Poisson-Nernst-Planck (PNP) equations for nanofluidic systems. Mathematical algorithms, such as Dirichlet to Neumann mapping and the matched interface and boundary (MIB) methods are developed to rigorously solve the aforementioned equations to the second-order accuracy in 3D realistic settings. Three ionic diffusive nanofluidic systems, including a negatively charged nanochannel, a bipolar nanochannel and a double-well nanochannel are designed to investigate the impact of atomic charges to channel current, density distribution and electrostatic potential. Numerical findings, such as gating, ion depletion and inversion, are in good agreements with those from experimental measurements and numerical simulations in the literature., Comment: 20 figures. arXiv admin note: text overlap with arXiv:1412.0176 by other authors
Published: 2015

45. Correlation function based Gaussian network models

Author: Xia, Kelin, Opron, Kristopher, and Wei, Guo-Wei
Subjects: Quantitative Biology - Biomolecules, Quantitative Biology - Quantitative Methods
Abstract: Gaussian network model (GNM) is one of the most accurate and efficient methods for biomolecular flexibility analysis. However, the systematic generalization of the GNM has been elusive. We show that the GNM Kirchhoff matrix can be built from the ideal low-pass filter, which is a special case of a wide class of correlation functions underpinning the linear scaling flexibility-rigidity index (FRI) method. Based on the mathematical structure of correlation functions, we propose a unified framework to construct generalized Kirchhoff matrices whose matrix inverse leads to correlation function based GNMs, whereas, the direct inverse of the diagonal elements gives rise to FRI method. We illustrate that correlation function based GNMs outperform the original GNM in the B-factor prediction of a set of 364 proteins. We demonstrate that for any given correlation function, FRI and GNM methods provide essentially identical B-factor predictions when the scale value in the correlation function is sufficiently large., Comment: 4 figures
Published: 2015

46. Multidimensional persistence in biomolecular data

Author: Xia, Kelin and Wei, Guo-Wei
Subjects: Quantitative Biology - Biomolecules
Abstract: Persistent homology has emerged as a popular technique for the topological simplification of big data, including biomolecular data. Multidimensional persistence bears considerable promise to bridge the gap between geometry and topology. However, its practical and robust construction has been a challenge. We introduce two families of multidimensional persistence, namely pseudo-multidimensional persistence and multiscale multidimensional persistence. The former is generated via the repeated applications of persistent homology filtration to high dimensional data, such as results from molecular dynamics or partial differential equations. The latter is constructed via isotropic and anisotropic scales that create new simiplicial complexes and associated topological spaces. The utility, robustness and efficiency of the proposed topological methods are demonstrated via protein folding, protein flexibility analysis, the topological denoising of cryo-electron microscopy data, and the scale dependence of nano particles. Topological transition between partial folded and unfolded proteins has been observed in multidimensional persistence. The separation between noise topological signatures and molecular topological fingerprints is achieved by the Laplace-Beltrami flow. The multiscale multidimensional persistent homology reveals relative local features in Betti-0 invariants and the relatively global characteristics of Betti-1 and Betti-2 invariants., Comment: 32 pages and 13 figures
Published: 2014

47. Second order Method for Solving 3D Elasticity Equations with Complex and Sharp Interfaces

Author: Wang, Bao, Xia, Kelin, and Wei, Guowei
Subjects: Mathematics - Numerical Analysis
Abstract: Elastic materials are ubiquitous in nature and indispensable components in man-made devices and equipments. When a device or equipment involves composite or multiple elastic materials, elasticity interface problems come into play. The solution of three dimensional (3D) elasticity interface problems is significantly more difficult than that of elliptic counterparts due to the coupled vector components and cross derivatives in the governing elasticity equation. This work introduces the matched interface and boundary (MIB) method for solving 3D elasticity interface problems. The proposed MIB method utilizes fictitious values on irregular grid points near the material interface to replace function values in the discretization so that the elasticity equation can be discretized using the standard finite difference schemes as if there were no material interface. The interface jump conditions are rigorously enforced on the intersecting points between the interface and the mesh lines. Such an enforcement determines the fictitious values. A number of new technique are developed to construct efficient MIB schemes for dealing with cross derivative in coupled governing equations. The proposed method is extensively validated over both weak and strong discontinuity of the solution, both piecewise constant and position-dependent material parameters, both smooth and nonsmooth interface geometries, and both small and large contrasts in the Poisson's ratio and shear modulus across the interface. Numerical experiments indicate that the present MIB method is of second order convergence in both $L_\infty$ and $L_2$ error norms., Comment: 40 pages, 23 pages
Published: 2014
Full Text: View/download PDF

48. Matched Interface and Boundary Method for Elasticity Interface Problems

Author: Wang, Bao, Xia, Kelin, and Wei, Guo-Wei
Subjects: Mathematics - Numerical Analysis
Abstract: Elasticity theory is an important component of continuum mechanics and has had widely spread applications in science and engineering. Material interfaces are ubiquity in nature and man-made devices, and often give rise to discontinuous coefficients in the governing elasticity equations. In this work, the matched interface and boundary (MIB) method is developed to address elasticity interface problems. Linear elasticity theory for both isotropic homogeneous and inhomogeneous media is employed. In our approach, Lam$\acute{e}$'s parameters can have jumps across the interface and are allowed to be position dependent in modeling isotropic inhomogeneous material. Both strong discontinuity, i.e., discontinuous solution, and weak discontinuity, namely, discontinuous derivatives of the solution, are considered in the present study. In the proposed method, fictitious values are utilized so that the standard central finite different schemes can be employed regardless of the interface. Interface jump conditions are enforced on the interface, which in turn, accurately determines fictitious values. We design new MIB schemes to account for complex interface geometries. In particular, the cross derivatives in the elasticity equations are difficult to handle for complex interface geometries. We propose secondary fictitious values and construct geometry based interpolation schemes to overcome this difficulty. Numerous analytical examples are used to validate the accuracy, convergence and robustness of the present MIB method for elasticity interface problems with both small and large curvatures, strong and weak discontinuities, and constant and variable coefficients. Numerical tests indicate second order accuracy in both $L_\infty$ and $L_2$ norms., Comment: 27 pages, 11 figures
Published: 2014

49. Fast and Anisotropic Flexibility-Rigidity Index

Author: Opron, Kristopher, Xia, Kelin, and Wei, Guo-Wei
Subjects: Quantitative Biology - Biomolecules
Abstract: The flexibility-rigidity index (FRI) is a newly proposed method for the construction of atomic rigidity functions. The FRI method analyzes protein rigidity and flexibility and is capable of predicting protein B-factors without resorting to matrix diagonalization. A fundamental assumption used in the FRI is that protein structures are uniquely determined by various internal and external interactions, while the protein functions, such as stability and flexibility, are solely determined by the structure. As such, one can predict protein flexibility without resorting to the protein interaction Hamiltonian. Consequently, bypassing the matrix diagonalization, the original FRI has a computational complexity of O(N^2). This work introduces a fast FRI (fFRI) algorithm for the flexibility analysis of large macromolecules. The proposed fFRI further reduces the computational complexity to O(N). Additionally, we propose anisotropic FRI (aFRI) algorithms for the analysis of protein collective dynamics. The aFRI algorithms admit adaptive Hessian matrices, from a completely global 3N*3N matrix to completely local 3*3 matrices. However, these local 3*3 matrices have built in much non-local correlation. Furthermore, we compare the accuracy and efficiency of FRI with some {established} approaches to flexibility analysis, namely, normal mode analysis (NMA) and Gaussian network model (GNM). The accuracy of the FRI method is tested. The FRI, particularly the fFRI, is orders of magnitude more efficient and about 10% more accurate overall than some of the most popular methods in the field. The proposed fFRI is able to predict B-factors for alpha-carbons of the HIV virus capsid (313,236 residues) in less than 30 seconds on a single processor using only one core. Finally, we demonstrate the application of FRI and aFRI to protein domain analysis., Comment: 10 figures and 50 references
Published: 2014
Full Text: View/download PDF

50. Persistent homology analysis of protein structure, flexibility and folding

Author: Xia, Kelin and Wei, Guo-Wei
Subjects: Quantitative Biology - Biomolecules
Abstract: Proteins are the most important biomolecules for living organisms. The understanding of protein structure, function, dynamics and transport is one of most challenging tasks in biological science. In the present work, persistent homology is, for the first time, introduced for extracting molecular topological fingerprints (MTFs) based on the persistence of molecular topological invariants. MTFs are utilized for protein characterization, identification and classification. The method of slicing is proposed to track the geometric origin of protein topological invariants. Both all-atom and coarse-grained representations of MTFs are constructed. A new cutoff-like filtration is proposed to shed light on the optimal cutoff distance in elastic network models. Based on the correlation between protein compactness, rigidity and connectivity, we propose an accumulated bar length generated from persistent topological invariants for the quantitative modeling of protein flexibility. To this end, a correlation matrix based filtration is developed. This approach gives rise to an accurate prediction of the optimal characteristic distance used in protein B-factor analysis. Finally, MTFs are employed to characterize protein topological evolution during protein folding and quantitatively predict the protein folding stability. An excellent consistence between our persistent homology prediction and molecular dynamics simulation is found. This work reveals the topology-function relationship of proteins., Comment: 22 figures, 82 references
Published: 2014
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

355 results on '"Xia, Kelin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources