9 results on '"Abdurrahman Elbasir"'
Search Results
2. A deep learning approach reveals unexplored landscape of viral expression in cancer
- Author
-
Abdurrahman Elbasir, Ying Ye, Daniel E. Schäffer, Xue Hao, Jayamanna Wickramasinghe, Konstantinos Tsingas, Paul M. Lieberman, Qi Long, Quaid Morris, Rugang Zhang, Alejandro A. Schäffer, and Noam Auslander
- Subjects
Science - Abstract
Here, Elbasir et al. develop viRNAtrap, a deep learning approach for detection of viruses from tumor RNA sequencing data, which they showcase on an RNA dataset of different cancer types, revealing tumor expression of divergent viruses that had not been previously implicated in cancer.
- Published
- 2023
- Full Text
- View/download PDF
3. Prediction and mechanistic analysis of drug-induced liver injury (DILI) based on chemical structure
- Author
-
Anika Liu, Moritz Walter, Peter Wright, Aleksandra Bartosik, Daniela Dolciami, Abdurrahman Elbasir, Hongbin Yang, and Andreas Bender
- Subjects
Drug-induced liver injury (DILI) ,Mechanistic models ,Structural alerts ,Protein target ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Drug-induced liver injury (DILI) is a major safety concern characterized by a complex and diverse pathogenesis. In order to identify DILI early in drug development, a better understanding of the injury and models with better predictivity are urgently needed. One approach in this regard are in silico models which aim at predicting the risk of DILI based on the compound structure. However, these models do not yet show sufficient predictive performance or interpretability to be useful for decision making by themselves, the former partially stemming from the underlying problem of labeling the in vivo DILI risk of compounds in a meaningful way for generating machine learning models. Results As part of the Critical Assessment of Massive Data Analysis (CAMDA) “CMap Drug Safety Challenge” 2019 ( http://camda2019.bioinf.jku.at ), chemical structure-based models were generated using the binarized DILIrank annotations. Support Vector Machine (SVM) and Random Forest (RF) classifiers showed comparable performance to previously published models with a mean balanced accuracy over models generated using 5-fold LOCO-CV inside a 10-fold training scheme of 0.759 ± 0.027 when predicting an external test set. In the models which used predicted protein targets as compound descriptors, we identified the most information-rich proteins which agreed with the mechanisms of action and toxicity of nonsteroidal anti-inflammatory drugs (NSAIDs), one of the most important drug classes causing DILI, stress response via TP53 and biotransformation. In addition, we identified multiple proteins involved in xenobiotic metabolism which could be novel DILI-related off-targets, such as CLK1 and DYRK2. Moreover, we derived potential structural alerts for DILI with high precision, including furan and hydrazine derivatives; however, all derived alerts were present in approved drugs and were over specific indicating the need to consider quantitative variables such as dose. Conclusion Using chemical structure-based descriptors such as structural fingerprints and predicted protein targets, DILI prediction models were built with a predictive performance comparable to previous literature. In addition, we derived insights on proteins and pathways statistically (and potentially causally) linked to DILI from these models and inferred new structural alerts related to this adverse endpoint.
- Published
- 2021
- Full Text
- View/download PDF
4. Characterizing The Landscape Of Viral Expression In Cancer By Deep Learning
- Author
-
Abdurrahman Elbasir, Ying Ye, Daniel Schäffer, Xue Hao, Jayamanna Wickramasinghe, Paul Lieberman, Quaid Morris, Rugang Zhang, Alejandro Schäffer, and Noam Auslander
- Abstract
About 15% of human cancer cases are attributed to viral infections. To date, virus expression in tumor tissues has been mostly studied by aligning tumor RNA sequencing reads to databases of known viruses. To allow identification of divergent viruses and rapid characterization of the tumor virome, we developed viRNAtrap, an alignment-free pipeline to identify viral reads and assemble viral contigs. We apply viRNAtrap, which is based on a deep learning model trained to discriminate viral RNAseq reads, to 14 cancer types from The Cancer Genome Atlas (TCGA). We find that expression of exogenous cancer viruses is associated with better overall survival. In contrast, expression of human endogenous viruses is associated with worse overall survival. Using viRNAtrap, we uncover expression of unexpected and divergent viruses that have not previously been implicated in cancer. The viRNAtrap pipeline provides a way forward to study viral infections associated with different clinical conditions.
- Published
- 2022
- Full Text
- View/download PDF
5. Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications
- Author
-
Andrew Patterson, Abdurrahman Elbasir, Bin Tian, and Noam Auslander
- Subjects
Cancer Research ,Oncology - Abstract
Since the rise of next-generation sequencing technologies, the catalogue of mutations in cancer has been continuously expanding. To address the complexity of the cancer-genomic landscape and extract meaningful insights, numerous computational approaches have been developed over the last two decades. In this review, we survey the current leading computational methods to derive intricate mutational patterns in the context of clinical relevance. We begin with mutation signatures, explaining first how mutation signatures were developed and then examining the utility of studies using mutation signatures to correlate environmental effects on the cancer genome. Next, we examine current clinical research that employs mutation signatures and discuss the potential use cases and challenges of mutation signatures in clinical decision-making. We then examine computational studies developing tools to investigate complex patterns of mutations beyond the context of mutational signatures. We survey methods to identify cancer-driver genes, from single-driver studies to pathway and network analyses. In addition, we review methods inferring complex combinations of mutations for clinical tasks and using mutations integrated with multi-omics data to better predict cancer phenotypes. We examine the use of these tools for either discovery or prediction, including prediction of tumor origin, treatment outcomes, prognosis, and cancer typing. We further discuss the main limitations preventing widespread clinical integration of computational tools for the diagnosis and treatment of cancer. We end by proposing solutions to address these challenges using recent advances in machine learning.
- Published
- 2023
- Full Text
- View/download PDF
6. BCrystal: an interpretable sequence-based protein crystallization predictor
- Author
-
Prasanna R. Kolatkar, Raghvendra Mall, Halima Bensmail, Khalid Kunji, Abdurrahman Elbasir, Gwo-Yu Chuang, Reda Rawi, and Zeyaul Islam
- Subjects
Statistics and Probability ,Correlation coefficient ,0206 medical engineering ,02 engineering and technology ,Computational biology ,Crystallography, X-Ray ,Biochemistry ,law.invention ,03 medical and health sciences ,Protein structure ,law ,Crystallization ,Molecular Biology ,030304 developmental biology ,Mathematics ,Sequence (medicine) ,Supplementary data ,0303 health sciences ,Computational Biology ,Proteins ,Solvent accessibility ,Original Papers ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Gradient boosting ,Protein crystallization ,Software ,020602 bioinformatics - Abstract
MotivationX-ray crystallography has facilitated the majority of protein structures determined to date. Sequence-based predictors that can accurately estimate protein crystallization propensities would be highly beneficial to overcome the high expenditure, large attrition rate, and to reduce the trial-and-error settings required for crystallization.ResultsIn this study, we present a novel model, BCrystal, which uses an optimized gradient boosting machine (XGBoost) on sequence, structural and physio-chemical features extracted from the proteins of interest. BCrystal also provides explanations, highlighting the most important features for the predicted crystallization propensity of an individual protein using the SHAP algorithm. On three independent test sets, BCrystal outperforms state-of-the-art sequence-based methods by more than 12.5% in accuracy, 18% in recall and 0.253 in Matthew’s correlation coefficient, with an average accuracy of 93.7%, recall of 96.63% and Matthew’s correlation coefficient of 0.868. For relative solvent accessibility of exposed residues, we observed higher values to associate positively with protein crystallizability and the number of disordered regions, fraction of coils and tripeptide stretches that contain multiple histidines associate negatively with crystallizability. The higher accuracy of BCrystal enables it to accurately screen for sequence variants with enhanced crystallizability.Availability and implementationOur BCrystal webserver is at https://machinelearning-protein.qcri.org/ and source code is available at https://github.com/raghvendra5688/BCrystal.Supplementary informationSupplementary data are available at Bioinformatics online.
- Published
- 2019
- Full Text
- View/download PDF
7. A Modelling Framework for Embedding-based Predictions for Compound-Viral Protein Activity
- Author
-
Sanjay Chawla, Ehsan Ullah, Raghvendra Mall, Hossam Almeer, Abdurrahman Elbasir, Prasanna R. Kolatkar, and Zeyaul Islam
- Subjects
Statistics and Probability ,Original Paper ,AcademicSubjects/SCI01060 ,Viral protein ,Computer science ,Rank (computer programming) ,Computational biology ,medicine.disease_cause ,Ligand (biochemistry) ,Biochemistry ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Docking (molecular) ,medicine ,Protein activity ,Representation (mathematics) ,Molecular Biology - Abstract
Motivation A global effort is underway to identify compounds for the treatment of COVID-19. Since de novo compound design is an extremely long, time-consuming and expensive process, efforts are underway to discover existing compounds that can be repurposed for COVID-19 and new viral diseases. We propose a machine learning representation framework that uses deep learning induced vector embeddings of compounds and viral proteins as features to predict compound-viral protein activity. The prediction model in-turn uses a consensus framework to rank approved compounds against viral proteins of interest. Results Our consensus framework achieves a high mean Pearson correlation of 0.916, mean R2 of 0.840 and a low mean squared error of 0.313 for the task of compound-viral protein activity prediction on an independent test set. As a use case, we identify a ranked list of 47 compounds common to three main proteins of SARS-COV-2 virus (PL-PRO, 3CL-PRO and Spike protein) as potential targets including 21 antivirals, 15 anticancer, 5 antibiotics and 6 other investigational human compounds. We perform additional molecular docking simulations to demonstrate that majority of these compounds have low binding energies and thus high binding affinity with the potential to be effective against the SARS-COV-2 virus. Availability and implementation All the source code and data is available at: https://github.com/raghvendra5688/Drug-Repurposing and https://dx.doi.org/10.17632/8rrwnbcgmx.3. We also implemented a web-server at: https://machinelearning-protein.qcri.org/index.html. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2021
8. Prediction and mechanistic analysis of drug-induced liver injury (DILI) based on chemical structure
- Author
-
Hongbin Yang, Anika Liu, Moritz Walter, Aleksandra Maria Bartosik, Peter Wright, Daniela Dolciami, Andreas Bender, Abdurrahman Elbasir, Liu, Anika [0000-0002-8561-4700], and Apollo - University of Cambridge Repository
- Subjects
Drug ,FOS: Computer and information sciences ,Bioinformatics ,Protein target ,media_common.quotation_subject ,In silico ,Immunology ,Computational biology ,Biology ,Models, Biological ,General Biochemistry, Genetics and Molecular Biology ,Machine Learning ,03 medical and health sciences ,0302 clinical medicine ,Humans ,Computer Simulation ,Proceedings of the Critical Assessment of Massive Data Analysis (CAMDA) Satellite Meeting to ISMB 2019 ,Structural alerts ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,media_common ,Interpretability ,Drug-induced liver injury (DILI) ,0303 health sciences ,Mechanistic models ,Applied Mathematics ,Research ,Random forest ,Support vector machine ,Drug development ,lcsh:Biology (General) ,030220 oncology & carcinogenesis ,Modeling and Simulation ,Test set ,Chemical and Drug Induced Liver Injury ,General Agricultural and Biological Sciences ,Predictive modelling - Abstract
Funder: GlaxoSmithKline (GB), Background: Drug-induced liver injury (DILI) is a major safety concern characterized by a complex and diverse pathogenesis. In order to identify DILI early in drug development, a better understanding of the injury and models with better predictivity are urgently needed. One approach in this regard are in silico models which aim at predicting the risk of DILI based on the compound structure. However, these models do not yet show sufficient predictive performance or interpretability to be useful for decision making by themselves, the former partially stemming from the underlying problem of labeling the in vivo DILI risk of compounds in a meaningful way for generating machine learning models. Results: As part of the Critical Assessment of Massive Data Analysis (CAMDA) “CMap Drug Safety Challenge” 2019 (http://camda2019.bioinf.jku.at), chemical structure-based models were generated using the binarized DILIrank annotations. Support Vector Machine (SVM) and Random Forest (RF) classifiers showed comparable performance to previously published models with a mean balanced accuracy over models generated using 5-fold LOCO-CV inside a 10-fold training scheme of 0.759 ± 0.027 when predicting an external test set. In the models which used predicted protein targets as compound descriptors, we identified the most information-rich proteins which agreed with the mechanisms of action and toxicity of nonsteroidal anti-inflammatory drugs (NSAIDs), one of the most important drug classes causing DILI, stress response via TP53 and biotransformation. In addition, we identified multiple proteins involved in xenobiotic metabolism which could be novel DILI-related off-targets, such as CLK1 and DYRK2. Moreover, we derived potential structural alerts for DILI with high precision, including furan and hydrazine derivatives; however, all derived alerts were present in approved drugs and were over specific indicating the need to consider quantitative variables such as dose. Conclusion: Using chemical structure-based descriptors such as structural fingerprints and predicted protein targets, DILI prediction models were built with a predictive performance comparable to previous literature. In addition, we derived insights on proteins and pathways statistically (and potentially causally) linked to DILI from these models and inferred new structural alerts related to this adverse endpoint.
- Published
- 2021
- Full Text
- View/download PDF
9. DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction
- Author
-
Halima Bensmail, Balasubramanian Moovarkumudalvan, Khalid Kunji, Raghvendra Mall, Abdurrahman Elbasir, and Prasanna R. Kolatkar
- Subjects
Statistics and Probability ,0301 basic medicine ,Source code ,Computer science ,media_common.quotation_subject ,Feature vector ,Machine learning ,computer.software_genre ,Biochemistry ,Convolutional neural network ,law.invention ,03 medical and health sciences ,Deep Learning ,Protein structure ,law ,Amino Acid Sequence ,Crystallization ,Molecular Biology ,030304 developmental biology ,media_common ,0303 health sciences ,Sequence ,Series (mathematics) ,business.industry ,Deep learning ,030302 biochemistry & molecular biology ,Computational Biology ,Proteins ,Pattern recognition ,Computer Science Applications ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,Artificial intelligence ,business ,Protein crystallization ,computer - Abstract
Motivation Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, the majority of these methods build their predictors by extracting features from protein sequences, which is computationally expensive and can explode the feature space. We propose DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction-quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on convolutional neural networks, which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to distinguish proteins that will result in diffraction-quality crystals from those that will not. Results Our model surpasses previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and Matthew’s correlation coefficient (MCC) on three independent test sets. DeepCrystal achieves an average improvement of 1.4, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf, respectively. In addition, DeepCrystal attains an average improvement of 2.1, 6.0% for F-score, 1.9, 3.9% for accuracy and 3.8, 7.0% for MCC w.r.t. Crysalis II and Crysf on independent test sets. Availability and implementation The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.