Author: "Leihong Wu" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Leihong Wu"' showing total 120 results

Start Over Author "Leihong Wu"

120 results on '"Leihong Wu"'

1. Enhancing Bias Assessment for Complex Term Groups in Language Embedding Models: Quantitative Comparison of Methods

Author: Magnus Gray, Mariofanna Milanova, and Leihong Wu
Subjects: Computer applications to medicine. Medical informatics, R858-859.7
Abstract: Abstract BackgroundArtificial intelligence (AI) is rapidly being adopted to build products and aid in the decision-making process across industries. However, AI systems have been shown to exhibit and even amplify biases, causing a growing concern among people worldwide. Thus, investigating methods of measuring and mitigating bias within these AI-powered tools is necessary. ObjectiveIn natural language processing applications, the word embedding association test (WEAT) is a popular method of measuring bias in input embeddings, a common area of measure bias in AI. However, certain limitations of the WEAT have been identified (ie, their nonrobust measure of bias and their reliance on predefined and limited groups of words or sentences), which may lead to inadequate measurements and evaluations of bias. Thus, this study takes a new approach at modifying this popular measure of bias, with a focus on making it more robust and applicable in other domains. MethodsIn this study, we introduce the SD-WEAT, which is a modified version of the WEAT that uses the SD of multiple permutations of the WEATs to calculate bias in input embeddings. With the SD-WEAT, we evaluated the biases and stability of several language embedding models, including Global Vectors for Word Representation (GloVe), Word2Vec, and bidirectional encoder representations from transformers (BERT). ResultsThis method produces results comparable to those of the WEAT, with strong correlations between the methods’ bias scores or effect sizes (rPr ConclusionsThus, the SD-WEAT shows promise for robustly measuring bias in the input embeddings fed to AI language models.
Published: 2024
Full Text: View/download PDF

2. Assessing the performance of large language models in literature screening for pharmacovigilance: a comparative study

Author: Dan Li, Leihong Wu, Mingfeng Zhang, Svitlana Shpyleva, Ying-Chi Lin, Ho-Yin Huang, Ting Li, and Joshua Xu
Subjects: pharmacovigilance, large language models, LLMs, literature based discovery, artificial intelligence, Therapeutics. Pharmacology, RM1-950
Abstract: Pharmacovigilance plays a crucial role in ensuring the safety of pharmaceutical products. It involves the systematic monitoring of adverse events and the detection of potential safety concerns related to drugs. Manual literature screening for pharmacovigilance related articles is a labor-intensive and time-consuming task, requiring streamlined solutions to cope with the continuous growth of literature. The primary objective of this study is to assess the performance of Large Language Models (LLMs) in automating literature screening for pharmacovigilance, aiming to enhance the process by identifying relevant articles more effectively. This study represents a novel application of LLMs including OpenAI’s GPT-3.5, GPT-4, and Anthropic’s Claude2, in the field of pharmacovigilance, evaluating their ability to categorize medical publications as relevant or irrelevant for safety signal reviews. Our analysis encompassed N-shot learning, chain-of-thought reasoning, and evaluating metrics, with a focus on factors impacting accuracy. The findings highlight the promising potential of LLMs in literature screening, achieving a reproducibility of 93%, sensitivity of 97%, and specificity of 67% showcasing notable strengths in terms of reproducibility and sensitivity, although with moderate specificity. Notably, performance improved when models were provided examples consisting of abstracts, labels, and corresponding reasoning explanations. Moreover, our exploration identified several potential contributing factors influencing prediction outcomes. These factors encompassed the choice of key words and prompts, the balance of the examples, and variations in reasoning explanations. By configuring advanced LLMs for efficient screening of extensive literature databases, this study underscores the transformative potential of these models in drug safety monitoring. Furthermore, these insights gained from this study can inform the development of automated systems for pharmacovigilance, contributing to the ongoing efforts to ensure the safety and efficacy of pharmacovigilance products.
Published: 2024
Full Text: View/download PDF

3. Accurate species identification of food-contaminating beetles with quality-improved elytral images and deep learning

Author: Halil Bisgin, Tanmay Bera, Leihong Wu, Hongjian Ding, Neslihan Bisgin, Zhichao Liu, Monica Pava-Ripoll, Amy Barnes, James F. Campbell, Himansi Vyas, Cesare Furlanello, Weida Tong, and Joshua Xu
Subjects: food-contaminating beetle, species identification, deep learning, convolutional neural networks, machine learning, food safety, Electronic computers. Computer science, QA75.5-76.95
Abstract: Food samples are routinely screened for food-contaminating beetles (i.e., pantry beetles) due to their adverse impact on the economy, environment, public health and safety. If found, their remains are subsequently analyzed to identify the species responsible for the contamination; each species poses different levels of risk, requiring different regulatory and management steps. At present, this identification is done through manual microscopic examination since each species of beetle has a unique pattern on its elytra (hardened forewing). Our study sought to automate the pattern recognition process through machine learning. Such automation will enable more efficient identification of pantry beetle species and could potentially be scaled up and implemented across various analysis centers in a consistent manner. In our earlier studies, we demonstrated that automated species identification of pantry beetles is feasible through elytral pattern recognition. Due to poor image quality, however, we failed to achieve prediction accuracies of more than 80%. Subsequently, we modified the traditional imaging technique, allowing us to acquire high-quality elytral images. In this study, we explored whether high-quality elytral images can truly achieve near-perfect prediction accuracies for 27 different species of pantry beetles. To test this hypothesis, we developed a convolutional neural network (CNN) model and compared performance between two different image sets for various pantry beetles. Our study indicates improved image quality indeed leads to better prediction accuracy; however, it was not the only requirement for achieving good accuracy. Also required are many high-quality images, especially for species with a high number of variations in their elytral patterns. The current study provided a direction toward achieving our ultimate goal of automated species identification through elytral pattern recognition.
Published: 2022
Full Text: View/download PDF

4. A verified genomic reference sample for assessing performance of cancer panels detecting small variants of low allele frequency

Author: Wendell Jones, Binsheng Gong, Natalia Novoradovskaya, Dan Li, Rebecca Kusko, Todd A. Richmond, Donald J. Johann, Halil Bisgin, Sayed Mohammad Ebrahim Sahraeian, Pierre R. Bushel, Mehdi Pirooznia, Katherine Wilkins, Marco Chierici, Wenjun Bao, Lee Scott Basehore, Anne Bergstrom Lucas, Daniel Burgess, Daniel J. Butler, Simon Cawley, Chia-Jung Chang, Guangchun Chen, Tao Chen, Yun-Ching Chen, Daniel J. Craig, Angela del Pozo, Jonathan Foox, Margherita Francescatto, Yutao Fu, Cesare Furlanello, Kristina Giorda, Kira P. Grist, Meijian Guan, Yingyi Hao, Scott Happe, Gunjan Hariani, Nathan Haseley, Jeff Jasper, Giuseppe Jurman, David Philip Kreil, Paweł Łabaj, Kevin Lai, Jianying Li, Quan-Zhen Li, Yulong Li, Zhiguang Li, Zhichao Liu, Mario Solís López, Kelci Miclaus, Raymond Miller, Vinay K. Mittal, Marghoob Mohiyuddin, Carlos Pabón-Peña, Barbara L. Parsons, Fujun Qiu, Andreas Scherer, Tieliu Shi, Suzy Stiegelmeyer, Chen Suo, Nikola Tom, Dong Wang, Zhining Wen, Leihong Wu, Wenzhong Xiao, Chang Xu, Ying Yu, Jiyang Zhang, Yifan Zhang, Zhihong Zhang, Yuanting Zheng, Christopher E. Mason, James C. Willey, Weida Tong, Leming Shi, and Joshua Xu
Subjects: Biology (General), QH301-705.5, Genetics, QH426-470
Abstract: Abstract Background Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance. Results In reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5–100× more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels. Conclusion These new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.
Published: 2021
Full Text: View/download PDF

5. Cross-oncopanel study reveals high sensitivity and accuracy with overall analytical performance depending on genomic regions

Author: Binsheng Gong, Dan Li, Rebecca Kusko, Natalia Novoradovskaya, Yifan Zhang, Shangzi Wang, Carlos Pabón-Peña, Zhihong Zhang, Kevin Lai, Wanshi Cai, Jennifer S. LoCoco, Eric Lader, Todd A. Richmond, Vinay K. Mittal, Liang-Chun Liu, Donald J. Johann, James C. Willey, Pierre R. Bushel, Ying Yu, Chang Xu, Guangchun Chen, Daniel Burgess, Simon Cawley, Kristina Giorda, Nathan Haseley, Fujun Qiu, Katherine Wilkins, Hanane Arib, Claire Attwooll, Kevin Babson, Longlong Bao, Wenjun Bao, Anne Bergstrom Lucas, Hunter Best, Ambica Bhandari, Halil Bisgin, James Blackburn, Thomas M. Blomquist, Lisa Boardman, Blake Burgher, Daniel J. Butler, Chia-Jung Chang, Alka Chaubey, Tao Chen, Marco Chierici, Christopher R. Chin, Devin Close, Jeffrey Conroy, Jessica Cooley Coleman, Daniel J. Craig, Erin Crawford, Angela del Pozo, Ira W. Deveson, Daniel Duncan, Agda Karina Eterovic, Xiaohui Fan, Jonathan Foox, Cesare Furlanello, Abhisek Ghosal, Sean Glenn, Meijian Guan, Christine Haag, Xinyi Hang, Scott Happe, Brittany Hennigan, Jennifer Hipp, Huixiao Hong, Kyle Horvath, Jianhong Hu, Li-Yuan Hung, Mirna Jarosz, Jennifer Kerkhof, Benjamin Kipp, David Philip Kreil, Paweł Łabaj, Pablo Lapunzina, Peng Li, Quan-Zhen Li, Weihua Li, Zhiguang Li, Yu Liang, Shaoqing Liu, Zhichao Liu, Charles Ma, Narasimha Marella, Rubén Martín-Arenas, Dalila B. Megherbi, Qingchang Meng, Piotr A. Mieczkowski, Tom Morrison, Donna Muzny, Baitang Ning, Barbara L. Parsons, Cloud P. Paweletz, Mehdi Pirooznia, Wubin Qu, Amelia Raymond, Paul Rindler, Rebecca Ringler, Bekim Sadikovic, Andreas Scherer, Egbert Schulze, Robert Sebra, Rita Shaknovich, Qiang Shi, Tieliu Shi, Juan Carlos Silla-Castro, Melissa Smith, Mario Solís López, Ping Song, Daniel Stetson, Maya Strahl, Alan Stuart, Julianna Supplee, Philippe Szankasi, Haowen Tan, Lin-ya Tang, Yonghui Tao, Shraddha Thakkar, Danielle Thierry-Mieg, Jean Thierry-Mieg, Venkat J. Thodima, David Thomas, Boris Tichý, Nikola Tom, Elena Vallespin Garcia, Suman Verma, Kimbley Walker, Charles Wang, Junwen Wang, Yexun Wang, Zhining Wen, Valtteri Wirta, Leihong Wu, Chunlin Xiao, Wenzhong Xiao, Shibei Xu, Mary Yang, Jianming Ying, Shun H. Yip, Guangliang Zhang, Sa Zhang, Meiru Zhao, Yuanting Zheng, Xiaoyan Zhou, Christopher E. Mason, Timothy Mercer, Weida Tong, Leming Shi, Wendell Jones, and Joshua Xu
Subjects: Oncopanel sequencing, Target enrichment, Molecular diagnostics, Reproducibility, Analytical performance, Precision medicine, Biology (General), QH301-705.5, Genetics, QH426-470
Abstract: Abstract Background Targeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S. Food and Drug Administration-led SEquence Quality Control project phase2 (SEQC2) effort, we perform a cross-platform multi-lab evaluation of eight Pan-Cancer panels to assess best practices for oncopanel sequencing. Results All panels demonstrate high sensitivity across targeted high-confidence coding regions and variant types for the variants previously verified to have variant allele frequency (VAF) in the 5–20% range. Sensitivity is reduced by utilizing VAF thresholds due to inherent variability in VAF measurements. Enforcing a VAF threshold for reporting has a positive impact on reducing false positive calls. Importantly, the false positive rate is found to be significantly higher outside the high-confidence coding regions, resulting in lower reproducibility. Thus, region restriction and VAF thresholds lead to low relative technical variability in estimating promising biomarkers and tumor mutational burden. Conclusion This comprehensive study provides actionable guidelines for oncopanel sequencing and clear evidence that supports a simplified approach to assess the analytical performance of oncopanels. It will facilitate the rapid implementation, validation, and quality control of oncopanels in clinical use.
Published: 2021
Full Text: View/download PDF

6. Optimized imaging methods for species-level identification of food-contaminating beetles

Author: Tanmay Bera, Leihong Wu, Hongjian Ding, Howard Semey, Amy Barnes, Zhichao Liu, Himansu Vyas, Weida Tong, and Joshua Xu
Subjects: Medicine, Science
Abstract: Abstract Identifying the exact species of pantry beetle responsible for food contamination, is imperative in assessing the risks associated with contamination scenarios. Each beetle species is known to have unique patterns on their hardened forewings (known as elytra) through which they can be identified. Currently, this is done through manual microanalysis of the insect or their fragments in contaminated food samples. We envision that the use of automated pattern analysis would expedite and scale up the identification process. However, such automation would require images to be captured in a consistent manner, thereby enabling the creation of large repositories of high-quality images. Presently, there is no standard imaging technique for capturing images of beetle elytra, which consequently means, there is no standard method of beetle species identification through elytral pattern analysis. This deficiency inspired us to optimize and standardize imaging methods, especially for food-contaminating beetles. For this endeavor, we chose multiple species of beetles belonging to different families or genera that have near-identical elytral patterns, and thus are difficult to identify correctly at the species level. Our optimized imaging method provides enhanced images such that the elytral patterns between individual species could easily be distinguished from each other, through visual observation. We believe such standardization is critical in developing automated species identification of pantry beetles and/or other insects. This eventually may lead to improved taxonomical classification, allowing for better management of food contamination and ecological conservation.
Published: 2021
Full Text: View/download PDF

7. DLI-IT: a deep learning approach to drug label identification through image and text embedding

Author: Xiangwen Liu, Joe Meehan, Weida Tong, Leihong Wu, Xiaowei Xu, and Joshua Xu
Subjects: Deep learning, Pharmaceutical packaging, Neural network, Drug labeling, Opioid drug, Semantic similarity, Computer applications to medicine. Medical informatics, R858-859.7
Abstract: Abstract Background Drug label, or packaging insert play a significant role in all the operations from production through drug distribution channels to the end consumer. Image of the label also called Display Panel or label could be used to identify illegal, illicit, unapproved and potentially dangerous drugs. Due to the time-consuming process and high labor cost of investigation, an artificial intelligence-based deep learning model is necessary for fast and accurate identification of the drugs. Methods In addition to image-based identification technology, we take advantages of rich text information on the pharmaceutical package insert of drug label images. In this study, we developed the Drug Label Identification through Image and Text embedding model (DLI-IT) to model text-based patterns of historical data for detection of suspicious drugs. In DLI-IT, we first trained a Connectionist Text Proposal Network (CTPN) to crop the raw image into sub-images based on the text. The texts from the cropped sub-images are recognized independently through the Tesseract OCR Engine and combined as one document for each raw image. Finally, we applied universal sentence embedding to transform these documents into vectors and find the most similar reference images to the test image through the cosine similarity. Results We trained the DLI-IT model on 1749 opioid and 2365 non-opioid drug label images. The model was then tested on 300 external opioid drug label images, the result demonstrated our model achieves up-to 88% of the precision in drug label identification, which outperforms previous image-based or text-based identification method by up-to 35% improvement. Conclusion To conclude, by combining Image and Text embedding analysis under deep learning framework, our DLI-IT approach achieved a competitive performance in advancing drug label identification.
Published: 2020
Full Text: View/download PDF

8. BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk

Author: Yue Wu, Zhichao Liu, Leihong Wu, Minjun Chen, and Weida Tong
Subjects: regulatory science, drug labeling, natural language processing, BERT, drug induced liver injury, United States Food and Drug Administration, Electronic computers. Computer science, QA75.5-76.95
Abstract: Background & Aims: The United States Food and Drug Administration (FDA) regulates a broad range of consumer products, which account for about 25% of the United States market. The FDA regulatory activities often involve producing and reading of a large number of documents, which is time consuming and labor intensive. To support regulatory science at FDA, we evaluated artificial intelligence (AI)-based natural language processing (NLP) of regulatory documents for text classification and compared deep learning-based models with a conventional keywords-based model.Methods: FDA drug labeling documents were used as a representative regulatory data source to classify drug-induced liver injury (DILI) risk by employing the state-of-the-art language model BERT. The resulting NLP-DILI classification model was statistically validated with both internal and external validation procedures and applied to the labeling data from the European Medicines Agency (EMA) for cross-agency application.Results: The NLP-DILI model developed using FDA labeling documents and evaluated by cross-validations in this study showed remarkable performance in DILI classification with a recall of 1 and a precision of 0.78. When cross-agency data were used to validate the model, the performance remained comparable, demonstrating that the model was portable across agencies. Results also suggested that the model was able to capture the semantic meanings of sentences in drug labeling.Conclusion: Deep learning-based NLP models performed well in DILI classification of drug labeling documents and learned the meanings of complex text in drug labeling. This proof-of-concept work demonstrated that using AI technologies to assist regulatory activities is a promising approach to modernize and advance regulatory science.
Published: 2021
Full Text: View/download PDF

9. Technical advance in targeted NGS analysis enables identification of lung cancer risk-associated low frequency TP53, PIK3CA, and BRAF mutations in airway epithelial cells

Author: Daniel J. Craig, Thomas Morrison, Sadik A. Khuder, Erin L. Crawford, Leihong Wu, Joshua Xu, Thomas M. Blomquist, and James C. Willey
Subjects: Biomarker, Low-frequency variant detection, Next generation sequencing, Lung Cancer, Neoplasms. Tumors. Oncology. Including cancer and carcinogens, RC254-282
Abstract: Abstract Background Standardized Nucleic Acid Quantification for SEQuencing (SNAQ-SEQ) is a novel method that utilizes synthetic DNA internal standards spiked into each sample prior to next generation sequencing (NGS) library preparation. This method was applied to analysis of normal appearing airway epithelial cells (AEC) obtained by bronchoscopy in an effort to define a somatic mutation field effect associated with lung cancer risk. There is a need for biomarkers that reliably detect those at highest lung cancer risk, thereby enabling more effective screening by annual low dose CT. The purpose of this study was to test the hypothesis that lung cancer risk is characterized by increased prevalence of low variant allele frequency (VAF) somatic mutations in lung cancer driver genes in AEC. Methods Synthetic DNA internal standards (IS) were prepared for 11 lung cancer driver genes and mixed with each AEC genomic (g) DNA specimen prior to competitive multiplex PCR amplicon NGS library preparation. A custom Perl script was developed to separate IS reads and respective specimen gDNA reads from each target into separate files for parallel variant frequency analysis. This approach identified nucleotide-specific sequencing error and enabled reliable detection of specimen mutations with VAF as low as 5 × 10− 4 (0.05%). This method was applied in a retrospective case-control study of AEC specimens collected by bronchoscopic brush biopsy from the normal airways of 19 subjects, including eleven lung cancer cases and eight non-cancer controls, and the association of lung cancer risk with AEC driver gene mutations was tested. Results TP53 mutations with 0.05–1.0% VAF were more prevalent (p
Published: 2019
Full Text: View/download PDF

10. Novel reference genes in colorectal cancer identify a distinct subset of high stage tumors and their associated histologically normal colonic tissues

Author: Lai Xu, Helen Luo, Rong Wang, Wells W. Wu, Je-Nie Phue, Rong-Fong Shen, Hartmut Juhl, Leihong Wu, Wei-lun Alterovitz, Vahan Simonyan, Lorraine Pelosof, and Amy S. Rosenberg
Subjects: Colorectal reference genes, High stage tumors, And molecular abnormalities in tumor adjacent tissues, Internal medicine, RC31-1245, Genetics, QH426-470
Abstract: Abstract Background Reference genes are often interchangeably called housekeeping genes due to 1) the essential cellular functions their proteins provide and 2) their constitutive expression across a range of normal and pathophysiological conditions. However, given the proliferative drive of malignant cells, many reference genes such as beta-actin (ACTB) and glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) which play critical roles in cell membrane organization and glycolysis, may be dysregulated in tumors versus their corresponding normal controls Methods Because Next Generation Sequencing (NGS) technology has several advantages over hybridization-based technologies, such as independent detection and quantitation of transcription levels, greater sensitivity, and increased dynamic range, we evaluated colorectal cancers (CRC) and their histologically normal tissue counterparts by NGS to evaluate the expression of 21 “classical” reference genes used as normalization standards for PCR based methods. Seventy-nine paired tissue samples of CRC and their patient matched healthy colonic tissues were subjected to NGS analysis of their mRNAs. Results We affirmed that 17 out of 21 classical reference genes had upregulated expression in tumors compared to normal colonic epithelial tissue and dramatically so in some cases. Indeed, tumors were distinguished from normal controls in both unsupervised hierarchical clustering analyses (HCA) and principal component analyses (PCA). We then identified 42 novel potential reference genes with minimal coefficients of variation (CV) across 79 CRC tumor pairs. Though largely consistently expressed across tumors and normal control tissues, a subset of high stage tumors (HSTs) as well as some normal tissue samples (HSNs) located adjacent to these HSTs demonstrated dysregulated expression, thus identifying a subset of tumors with a potentially distinct and aggressive biological profile. Conclusion While classical CRC reference genes were found to be differentially expressed between tumors and normal controls, novel reference genes, identified via NGS, were more consistently expressed across malignant and normal colonic tissues. Nonetheless, a subset of HST had profound dysregulation of such genes as did many of the histologically normal tissues adjacent to such HSTs, indicating that the HSTs so distinguished may have unique biological properties and that their histologically normal tissues likely harbor a small population of microscopically undetected but metabolically active tumors.
Published: 2019
Full Text: View/download PDF

11. HetEnc: a deep learning predictive model for multi-type biological dataset

Author: Leihong Wu, Xiangwen Liu, and Joshua Xu
Subjects: Biotechnology, TP248.13-248.65, Genetics, QH426-470
Abstract: Abstract Background Researchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; however, is critical and challenging. In this study, we proposed HetEnc, a novel deep learning-based approach, for information domain separation. Results HetEnc includes both an unsupervised feature representation module and a supervised neural network module to handle multi-platform gene expression datasets. It first constructs three different encoding networks to represent the original gene expression data using high-level abstracted features. A six-layer fully-connected feed-forward neural network is then trained using these abstracted features for each targeted endpoint. We applied HetEnc to the SEQC neuroblastoma dataset to demonstrate that it outperforms other machine learning approaches. Although we used multi-platform data in feature abstraction and model training, HetEnc does not need multi-platform data for prediction, enabling a broader application of the trained model by reducing the cost of gene expression profiling for new samples to a single platform. Thus, HetEnc provides a new solution to integrated gene expression analysis, accelerating modern biological research.
Published: 2019
Full Text: View/download PDF

12. Study of serious adverse drug reactions using FDA-approved drug labeling and MedDRA

Author: Leihong Wu, Taylor Ingle, Zhichao Liu, Anna Zhao-Wong, Stephen Harris, Shraddha Thakkar, Guangxu Zhou, Junshuang Yang, Joshua Xu, Darshan Mehta, Weigong Ge, Weida Tong, and Hong Fang
Subjects: Adverse drug reactions, Data mining, MedDRA, Drug labeling, Boxed Warning, Structured product labeling, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background Adverse Drug Reactions (ADRs) are of great public health concern. FDA-approved drug labeling summarizes ADRs of a drug product mainly in three sections, i.e., Boxed Warning (BW), Warnings and Precautions (WP), and Adverse Reactions (AR), where the severity of ADRs are intended to decrease in the order of BW > WP > AR. Several reported studies have extracted ADRs from labeling documents, but most, if not all, did not discriminate the severity of the ADRs by the different labeling sections. Such a practice could overstate or underestimate the impact of certain ADRs to the public health. In this study, we applied the Medical Dictionary for Regulatory Activities (MedDRA) to drug labeling and systematically analyzed and compared the ADRs from the three labeling sections with a specific emphasis on analyzing serious ADRs presented in BW, which is of most drug safety concern. Results This study investigated New Drug Application (NDA) labeling documents for 1164 single-ingredient drugs using Oracle Text search to extract MedDRA terms. We found that only a small portion of MedDRA Preferred Terms (PTs), 3819 out of 21,920 or 17.42%, were observed in a whole set of documents. In detail, 466/3819 (12.0%) PTs were in BW, 2023/3819 (53.0%) were in WP, and 2961/3819 (77.5%) were in AR sections. We also found a higher overlap of top 20 occurring BW PTs with WP sections compared to AR sections. Within the MedDRA System Organ Class levels, serious ADRs (sADRs) from BW were prevalent in Nervous System disorders and Vascular disorders. A Hierarchical Cluster Analysis (HCA) revealed that drugs within the same therapeutic category shared the same ADR patterns in BW (e.g., nervous system drug class is highly associated with drug abuse terms such as dependence, substance abuse, and respiratory depression). Conclusions This study demonstrated that combining MedDRA standard terminologies with data mining techniques facilitated computer-aided ADR analysis of drug labeling. We also highlighted the importance of labeling sections that differ in seriousness and application in drug safety. Using sADRs primarily related to BW sections, we illustrated a prototype approach for computer-aided ADR monitoring and studies which can be applied to other public health documents.
Published: 2019
Full Text: View/download PDF

13. Bioactivity Signatures of Drugs vs. Environmental Chemicals Revealed by Tox21 High-Throughput Screening Assays

Author: Deborah K. Ngan, Lin Ye, Leihong Wu, Menghang Xia, Anna Rossoshek, Anton Simeonov, and Ruili Huang
Subjects: in vitro assay, quantitative high throughput screening, Tox21, drug, environmental chemical, Information technology, T58.5-58.64
Abstract: Humans are exposed to tens of thousands of chemicals over the course of a lifetime, yet there remains inadequate data on the potential harmful effects of these substances on human health. Using quantitative high-throughput screening (qHTS), we can test these compounds for potential toxicity in a more efficient and cost-effective way compared to traditional animal studies. Tox21 has developed a library of ~10,000 chemicals (Tox21 10K) comprising one-third approved and investigational drugs and two-thirds environmental chemicals. In this study, the Tox21 10K was screened in a qHTS format against a panel of 70 cell-based assays with 213 readouts covering a broad range of biological pathways. Activity profiles were compared with chemical structure to assess their ability to differentiate drugs from environmental chemicals, and structure was found to be a better predictor of which chemicals are likely to be drugs. Drugs and environmental chemicals were further analyzed for diversity in structure and biological response space and showed distinguishable, but not distinct, responses in the Tox21 assays. Inclusion of other targets and pathways to further diversify the biological response space covered by these assays is likely required to better evaluate the safety profile of drugs and environmental chemicals to prioritize for in-depth toxicological studies.
Published: 2019
Full Text: View/download PDF

14. Enhanced QSAR Model Performance by Integrating Structural and Gene Expression Information

Author: Xiaohui Fan, Li Xing, Wei Liu, Leihong Wu, and Qian Chen
Subjects: quantitative structure-activity relationships (QSAR), SAR paradox, molecular modeling, gene expression, integrative analysis, Organic chemistry, QD241-441
Abstract: Despite decades of intensive research and a number of demonstrable successes, quantitative structure-activity relationship (QSAR) models still fail to yield predictions with reasonable accuracy in some circumstances, especially when the QSAR paradox occurs. In this study, to avoid the QSAR paradox, we proposed a novel integrated approach to improve the model performance through using both structural and biological information from compounds. As a proof-of-concept, the integrated models were built on a toxicological dataset to predict non-genotoxic carcinogenicity of compounds, using not only the conventional molecular descriptors but also expression profiles of significant genes selected from microarray data. For test set data, our results demonstrated that the prediction accuracy of QSAR model was dramatically increased from 0.57 to 0.67 with incorporation of expression data of just one selected signature gene. Our successful integration of biological information into classic QSAR model provided a new insight and methodology for building predictive models especially when QSAR paradox occurred.
Published: 2013
Full Text: View/download PDF

15. SD-WEAT: Towards Robustly Measuring Bias in Input Embeddings.

Author: Magnus Gray and Leihong Wu
Published: 2024
Full Text: View/download PDF

16. A network pharmacology study of Chinese medicine QiShenYiQi to reveal its underlying multi-compound, multi-target, multi-pathway mode of action.

Author: Xiang Li, Leihong Wu, Wei Liu, Yecheng Jin, Qian Chen, Linli Wang, Xiaohui Fan, Zheng Li, and Yiyu Cheng
Subjects: Medicine, Science
Abstract: Chinese medicine is a complex system guided by traditional Chinese medicine (TCM) theories, which has proven to be especially effective in treating chronic and complex diseases. However, the underlying modes of action (MOA) are not always systematically investigated. Herein, a systematic study was designed to elucidate the multi-compound, multi-target and multi-pathway MOA of a Chinese medicine, QiShenYiQi (QSYQ), on myocardial infarction. QSYQ is composed of Astragalus membranaceus (Huangqi), Salvia miltiorrhiza (Danshen), Panax notoginseng (Sanqi), and Dalbergia odorifera (Jiangxiang). Male Sprague Dawley rat model of myocardial infarction were administered QSYQ intragastrically for 7 days while the control group was not treated. The differentially expressed genes (DEGs) were identified from myocardial infarction rat model treated with QSYQ, followed by constructing a cardiovascular disease (CVD)-related multilevel compound-target-pathway network connecting main compounds to those DEGs supported by literature evidences and the pathways that are functionally enriched in ArrayTrack. 55 potential targets of QSYQ were identified, of which 14 were confirmed in CVD-related literatures with experimental supporting evidences. Furthermore, three sesquiterpene components of QSYQ, Trans-nerolidol, (3S,6S,7R)-3,7,11-trimethyl-3,6-epoxy-1,10-dodecadien-7-ol and (3S,6R,7R)-3,7,11-trimethyl-3,6-epoxy-1,10-dodecadien-7-ol from Dalbergia odorifera T. Chen, were validated experimentally in this study. Their anti-inflammatory effects and potential targets including extracellular signal-regulated kinase-1/2, peroxisome proliferator-activated receptor-gamma and heme oxygenase-1 were identified. Finally, through a three-level compound-target-pathway network with experimental analysis, our study depicts a complex MOA of QSYQ on myocardial infarction.
Published: 2014
Full Text: View/download PDF

17. Determination of minimum training sample size for microarray-based cancer outcome prediction-an empirical assessment.

Author: Li Shao, Xiaohui Fan, Ningtao Cheng, Leihong Wu, and Yiyu Cheng
Subjects: Medicine, Science
Abstract: The promise of microarray technology in providing prediction classifiers for cancer outcome estimation has been confirmed by a number of demonstrable successes. However, the reliability of prediction results relies heavily on the accuracy of statistical parameters involved in classifiers. It cannot be reliably estimated with only a small number of training samples. Therefore, it is of vital importance to determine the minimum number of training samples and to ensure the clinical value of microarrays in cancer outcome prediction. We evaluated the impact of training sample size on model performance extensively based on 3 large-scale cancer microarray datasets provided by the second phase of MicroArray Quality Control project (MAQC-II). An SSNR-based (scale of signal-to-noise ratio) protocol was proposed in this study for minimum training sample size determination. External validation results based on another 3 cancer datasets confirmed that the SSNR-based approach could not only determine the minimum number of training samples efficiently, but also provide a valuable strategy for estimating the underlying performance of classifiers in advance. Once translated into clinical routine applications, the SSNR-based protocol would provide great convenience in microarray-based cancer outcome prediction in improving classifier reliability.
Published: 2013
Full Text: View/download PDF

18. Challenges, Solutions, and Quality Metrics of Personal Genome Assembly in Advancing Precision Medicine

Author: Wenming Xiao, Leihong Wu, Gokhan Yavas, Vahan Simonyan, Baitang Ning, and Huixiao Hong
Subjects: genome, sequencing, assembly, personal genome, quality metrics, Pharmacy and materia medica, RS1-441
Abstract: Even though each of us shares more than 99% of the DNA sequences in our genome, there are millions of sequence codes or structure in small regions that differ between individuals, giving us different characteristics of appearance or responsiveness to medical treatments. Currently, genetic variants in diseased tissues, such as tumors, are uncovered by exploring the differences between the reference genome and the sequences detected in the diseased tissue. However, the public reference genome was derived with the DNA from multiple individuals. As a result of this, the reference genome is incomplete and may misrepresent the sequence variants of the general population. The more reliable solution is to compare sequences of diseased tissue with its own genome sequence derived from tissue in a normal state. As the price to sequence the human genome has dropped dramatically to around $1000, it shows a promising future of documenting the personal genome for every individual. However, de novo assembly of individual genomes at an affordable cost is still challenging. Thus, till now, only a few human genomes have been fully assembled. In this review, we introduce the history of human genome sequencing and the evolution of sequencing platforms, from Sanger sequencing to emerging “third generation sequencing” technologies. We present the currently available de novo assembly and post-assembly software packages for human genome assembly and their requirements for computational infrastructures. We recommend that a combined hybrid assembly with long and short reads would be a promising way to generate good quality human genome assemblies and specify parameters for the quality assessment of assembly outcomes. We provide a perspective view of the benefit of using personal genomes as references and suggestions for obtaining a quality personal genome. Finally, we discuss the usage of the personal genome in aiding vaccine design and development, monitoring host immune-response, tailoring drug therapy and detecting tumors. We believe the precision medicine would largely benefit from bioinformatics solutions, particularly for personal genome assembly.
Published: 2016
Full Text: View/download PDF

19. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine

Author: Joshua Xu, Binsheng Gong, Leihong Wu, Shraddha Thakkar, Huixiao Hong, and Weida Tong
Subjects: genomics, RNA-seq, reproducibility, big data, next generation sequencing, Pharmacy and materia medica, RS1-441
Abstract: Studies on gene expression in response to therapy have led to the discovery of pharmacogenomics biomarkers and advances in precision medicine. Whole transcriptome sequencing (RNA-seq) is an emerging tool for profiling gene expression and has received wide adoption in the biomedical research community. However, its value in regulatory decision making requires rigorous assessment and consensus between various stakeholders, including the research community, regulatory agencies, and industry. The FDA-led SEquencing Quality Control (SEQC) consortium has made considerable progress in this direction, and is the subject of this review. Specifically, three RNA-seq platforms (Illumina HiSeq, Life Technologies SOLiD, and Roche 454) were extensively evaluated at multiple sites to assess cross-site and cross-platform reproducibility. The results demonstrated that relative gene expression measurements were consistently comparable across labs and platforms, but not so for the measurement of absolute expression levels. As part of the quality evaluation several studies were included to evaluate the utility of RNA-seq in clinical settings and safety assessment. The neuroblastoma study profiled tumor samples from 498 pediatric neuroblastoma patients by both microarray and RNA-seq. RNA-seq offers more utilities than microarray in determining the transcriptomic characteristics of cancer. However, RNA-seq and microarray-based models were comparable in clinical endpoint prediction, even when including additional features unique to RNA-seq beyond gene expression. The toxicogenomics study compared microarray and RNA-seq profiles of the liver samples from rats exposed to 27 different chemicals representing multiple toxicity modes of action. Cross-platform concordance was dependent on chemical treatment and transcript abundance. Though both RNA-seq and microarray are suitable for developing gene expression based predictive models with comparable prediction performance, RNA-seq offers advantages over microarray in profiling genes with low expression. The rat BodyMap study provided a comprehensive rat transcriptomic body map by performing RNA-Seq on 320 samples from 11 organs in either sex of juvenile, adolescent, adult and aged Fischer 344 rats. Lastly, the transferability study demonstrated that signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development using a comprehensive approach with two large clinical data sets. This result suggests continued usefulness of legacy microarray data in the coming RNA-seq era. In conclusion, the SEQC project enhances our understanding of RNA-seq and provides valuable guidelines for RNA-seq based clinical application and safety evaluation to advance precision medicine.
Published: 2016
Full Text: View/download PDF

20. Shifting from population-wide to personalized cancer prognosis with microarrays.

Author: Li Shao, Xiaohui Fan, Ningtao Cheng, Leihong Wu, Haoshu Xiong, Hong Fang, Don Ding, Leming Shi, Yiyu Cheng, and Weida Tong
Subjects: Medicine, Science
Abstract: The era of personalized medicine for cancer therapeutics has taken an important step forward in making accurate prognoses for individual patients with the adoption of high-throughput microarray technology. However, microarray technology in cancer diagnosis or prognosis has been primarily used for the statistical evaluation of patient populations, and thus excludes inter-individual variability and patient-specific predictions. Here we propose a metric called clinical confidence that serves as a measure of prognostic reliability to facilitate the shift from population-wide to personalized cancer prognosis using microarray-based predictive models. The performance of sample-based models predicted with different clinical confidences was evaluated and compared systematically using three large clinical datasets studying the following cancers: breast cancer, multiple myeloma, and neuroblastoma. Survival curves for patients, with different confidences, were also delineated. The results show that the clinical confidence metric separates patients with different prediction accuracies and survival times. Samples with high clinical confidence were likely to have accurate prognoses from predictive models. Moreover, patients with high clinical confidence would be expected to live for a notably longer or shorter time if their prognosis was good or grim based on the models, respectively. We conclude that clinical confidence could serve as a beneficial metric for personalized cancer prognosis prediction utilizing microarrays. Ascribing a confidence level to prognosis with the clinical confidence metric provides the clinician an objective, personalized basis for decisions, such as choosing the severity of the treatment.
Published: 2012
Full Text: View/download PDF

21. Does applicability domain exist in microarray-based genomic research?

Author: Li Shao, Leihong Wu, Hong Fang, Weida Tong, and Xiaohui Fan
Subjects: Medicine, Science
Abstract: Constructing an accurate predictive model for clinical decision-making on the basis of a relatively small number of tumor samples with high-dimensional microarray data remains a very challenging problem. The validity of such models has been seriously questioned due to their failure in clinical validation using independent samples. Besides the statistical issues such as selection bias, some studies further implied the probable reason was improper sample selection that did not resemble the genomic space defined by the training population. Assuming that predictions would be more reliable for interpolation than extrapolation, we set to investigate the impact of applicability domain (AD) on model performance in microarray-based genomic research by evaluating and comparing model performance for samples with different extrapolation degrees. We found that the issue of applicability domain may not exist in microarray-based genomic research for clinical applications. Therefore, it is not practicable to improve model validity based on applicability domain.
Published: 2010
Full Text: View/download PDF

22. Integrating Drug's Mode of Action into Quantitative Structure-Activity Relationships for Improved Prediction of Drug-Induced Liver Injury.

Author: Leihong Wu, Zhichao Liu, Scott M. Auerbach, Ruili Huang, Minjun Chen, Kristin McEuen, Joshua Xu, Hong Fang, and Weida Tong
Published: 2017
Full Text: View/download PDF

23. A deep learning model to recognize food contaminating beetle species based on elytra fragments.

Author: Leihong Wu, Zhichao Liu, Tanmay Bera, Hongjian Ding, Darryl Langley, Amy Barnes, Cesare Furlanello, Valerio Maggio, Weida Tong, and Joshua Xu
Published: 2019
Full Text: View/download PDF

24. Computational Modeling for the Prediction of Hepatotoxicity Caused by Drugs and Chemicals

Author: Minjun Chen, Jie Liu, Tsung-Jen Liao, Kristin Ashby, Yue Wu, Leihong Wu, Weida Tong, and Huixiao Hong
Published: 2023
Full Text: View/download PDF

25. Analysis of Polyethylene-Related Revisions After Total Ankle Replacements Reported in US Food and Drug Administration Medical Device Adverse Event Database

Author: Hongying Jiang, Leihong Wu, Per-Henrik Randsborg, Jennifer Houck, Limin Sun, Marissa Marine, Megan Chow, Joseph Peluso, and Raquel Peat
Subjects: Orthopedics and Sports Medicine, Surgery
Abstract: Background: There are 2 general types of total ankle replacement (TAR) designs with respect to the polyethylene insert, mobile-bearing (MB) and fixed-bearing (FB) TARs. The aim of this study is to compare polyethylene-related adverse events (AEs), particularly revisions, reported for MB TARs and FB TARs using the US Food and Drug Administration’s (FDA’s) Manufacturer and User Facility Device Experience (MAUDE) database. Methods: A text mining method was applied to the medical device reporting (MDR) in the MAUDE database from 1991 to 2020, followed by manual reviews to identify, characterize, and describe all polyethylene-related AEs, including revisions, reported for MB and FB TARs. Results: We found 1841 MDRs for MB (STAR Ankle only) and 1273 MDRs for 40+ FB TARs approved/cleared by the FDA. For the MB design, 33% (606/1841) of the AEs reported related to the polyethylene component, compared to 24% (291/1273) of the AEs reported for FB designs. Polyethylene fractures were reported in 11.3% (208/1841) for the MB designs compared to 0.2% (2/1273) for the FB designs. Half of the polyethylene-related revisions occurred within an average of 4.1 years after implantation for the MB design compared within an average of 5.2 years for FB designs. Conclusion: Analysis of this database revealed a higher proportion of reported polyethylene fractures and greater need for earlier revisions for polyethylene-related issues with use of the primary MB design in the database as compared with FB TAR designs. Further study of device-related complications with more recent designs for both MB and FB ankle replacement components are needed to improve the outcomes of total ankle replacement. Level of Evidence: Level III, retrospective comparative study.
Published: 2022

26. Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing

Author: Maryellen de Mars, Cu Nguyen, Tiffany Hung, Eric Peters, Charles Lu, Meijian Guan, Bao Tran, Maurizio Polano, Bin Zhu, Samir Lababidi, Wendell D. Jones, Chunlin Xiao, Andreas Scherer, K J. Langenbach, Zhipan Li, Luyao Ren, Weida Tong, Erich Jaeger, Rebecca Kusko, Zivana Tezak, Ying Yu, Ulrika Liljedahl, Louis M. Staudt, Huixiao Hong, Jing Wang, Yuanting Zheng, Ali Moshrefi, Cristobal Juan Vera, Chris Miller, Rasika Kalamegham, Arati Raziuddin, Howard Jacob, Roberta Maestro, Bindu Swapna Madala, Petr Vojta, Jessica Nordlund, Li Tai Fang, Jiri Drabek, Xuelu Liu, Corey Miles, Gary P. Schroth, Fayaz Seifuddin, Tim R. Mercer, Chunhua Yan, Leihong Wu, Sulev Kõks, Roderick V. Jensen, Jennifer A Hipp, Yun-Ching Chen, Malcolm Moos, Yongmei Zhao, Baitang Ning, Aparna Natarajan, Brian N. Papas, Xin Chen, Ashley Walton, Stephen T. Sherry, Christopher E. Mason, Liz Kerrigan, Ogan D Abaan, Wanqiu Chen, Kenneth Idler, Jingya Wang, Tsai-wei Shen, James C. Willey, Ene Reimann, Justin B. Lack, Virginie Petitjean, Jyoti Shetty, Daoud Meerzaman, Charles Wang, Jian-Liang Li, Tiffany Truong, Keyur Talsania, Mehdi Pirooznia, Marc Sultan, Urvashi Mehra, Wenming Xiao, Zhong Chen, Ana Granat, Leming Shi, Margaret C. Cam, Qing-Rong Chen, Eric F. Donaldson, Wolfgang Resch, Ben Ernest, Yuliya Kriga, Gokhan Yavas, Thomas M. Blomquist, and Parthav Jailwala
Subjects: Computer science, Sequence analysis, Biomedical Engineering, Bioengineering, Computational biology, Applied Microbiology and Biotechnology, Genome, Article, Cell Line, Cell Line, Tumor, Neoplasms, Exome Sequencing, medicine, Humans, Mutation detection, Exome sequencing, Protocol (science), Reproducibility, Whole Genome Sequencing, High-Throughput Nucleotide Sequencing, Reproducibility of Results, Cancer, Sequence Analysis, DNA, medicine.disease, Benchmarking, Mutation, Mutation (genetic algorithm), Molecular Medicine, Biotechnology
Abstract: Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor–normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.
Published: 2021
Full Text: View/download PDF

27. Complete Genome Sequence of Campylobacter coli Strain P4581, a Hybrid Carrying Campylobacter jejuni Genomic Content, Isolated from Rhesus Monkey, Macaca mulatta

Author: Sung Guk Kim, Christine V. Summage-West, Lillie M. Sims, Leihong Wu, JaeHyun Kim, Seongwon Nho, and Steven L. Foley
Subjects: Immunology and Microbiology (miscellaneous), Genetics, Molecular Biology
Abstract: Campylobacter coli is a leading bacterial cause of human gastroenteritis. We reported the circularized 1.8-Mbp complete genome of MLST type 1055 C. coli strain P4581 isolated from a rhesus monkey, Macaca mulatta , hybridizing Illumina short- and Nanopore long-reads.
Published: 2022

28. NeuroCORD: A Language Model to Facilitate COVID-19-Associated Neurological Disorder Studies

Author: Leihong Wu, Syed Ali, Heather Ali, Tyrone Brock, Joshua Xu, and Weida Tong
Subjects: Artificial Intelligence, text mining, language model, BERT model, neurological disorders, COVID-19, information retrieval, machine learning, Health, Toxicology and Mutagenesis, Public Health, Environmental and Occupational Health, Humans, Nervous System Diseases, Language
Abstract: COVID-19 can lead to multiple severe outcomes including neurological and psychological impacts. However, it is challenging to manually scan hundreds of thousands of COVID-19 articles on a regular basis. To update our knowledge, provide sound science to the public, and communicate effectively, it is critical to have an efficient means of following the most current published data. In this study, we developed a language model to search abstracts using the most advanced artificial intelligence (AI) to accurately retrieve articles on COVID-19-associated neurological disorders. We applied this NeuroCORD model to the largest benchmark dataset of COVID-19, CORD-19. We found that the model developed on the training set yielded 94% prediction accuracy on the test set. This result was subsequently verified by two experts in the field. In addition, when applied to 96,000 non-labeled articles that were published after 2020, the NeuroCORD model accurately identified approximately 3% of them to be relevant for the study of COVID-19-associated neurological disorders, while only 0.5% were retrieved using conventional keyword searching. In conclusion, NeuroCORD provides an opportunity to profile neurological disorders resulting from COVID-19 in a rapid and efficient fashion, and its general framework could be used to study other COVID-19-related emerging health issues.
Published: 2022
Full Text: View/download PDF

29. Evaluating the analytical validity of circulating tumor DNA sequencing assays for precision oncology

Author: Agda Karina Eterovic, Rebecca Ringler, Binsheng Gong, Jiashi Wang, Christopher E. Mason, Garima Kushwaha, Abhisek Ghosal, James C. Willey, Zhihong Zhang, Wenzhong Xiao, Natalia Novoradovskaya, Joshua Xu, Mary Yang, Zhiguang Li, Leihong Wu, Philippe Szankasi, Paul Rindler, Suman Verma, Simon Cawley, Daniel Stetson, Yilin Zhang, Fergal Casey, Nathan Haseley, Fujun Qiu, Paula Proszek, Mirna Jarosz, Ira W. Deveson, Kevin Lai, Donald J. Johann, Rebecca Kusko, Haowen Tan, Tim R. Mercer, Chia Jung Chang, Kristina Giorda, Tieliu Shi, Blake Burgher, Kyle Horvath, Leming Shi, Pierre R. Bushel, Guangchun Chen, Melissa Smith, Igor Stevanovski, Devin W. Close, Liang Feng, Lin-ya Tang, Liang-Chun Liu, Todd Richmond, Wendell D. Jones, Daniel Duncan, Andreas Scherer, Jorge Dinis, Chang Xu, Rita Shaknovich, Tom Morrison, Ping Song, Nikola Tom, Weida Tong, Quan Zhen Li, Lihyun Sun, Charles Ma, Dalila B. Megherbi, Robert Sebra, Sean Glenn, Narasimha Marella, Ambica Bhandari, Jennifer S. LoCoco, Zhichao Liu, Jeoffrey Schageman, Mehdi Pirooznia, Scott Happe, Hanane Arib, Maya Strahl, Dan Li, Hunter Best, Venkat J. Thodima, Amelia Raymond, Jeffrey Conroy, Mike Hubank, Piotr A. Mieczkowski, Carlos Pabón-Peña, James Blackburn, Li-Yuan Hung, Sa Zhang, Bindu Swapna Madala, Guangliang Zhang, and Jonathan Choi
Subjects: Validation study, Biomedical Engineering, Bioengineering, Computational biology, Biology, Medical Oncology, Applied Microbiology and Biotechnology, Article, Circulating Tumor DNA, 03 medical and health sciences, 0302 clinical medicine, Synthetic DNA, Limit of Detection, Neoplasms, Proficiency testing, False positive paradox, Humans, Precision Medicine, 030304 developmental biology, 0303 health sciences, Reproducibility, High-Throughput Nucleotide Sequencing, Reproducibility of Results, Variant allele, Sequence Analysis, DNA, Precision oncology, Circulating tumor DNA, Practice Guidelines as Topic, Molecular Medicine, 030217 neurology & neurosurgery, Biotechnology
Abstract: Circulating tumor DNA (ctDNA) sequencing is being rapidly adopted in precision oncology, but the accuracy, sensitivity, and reproducibility of ctDNA assays is poorly understood. Here we report the findings of a multi-site, cross-platform evaluation of the analytical performance of five industry-leading ctDNA assays. We evaluated each stage of the ctDNA sequencing workflow with simulations, synthetic DNA spike-in experiments, and proficiency testing on standardized cell line–derived reference samples. Above 0.5% variant allele frequency, ctDNA mutations were detected with high sensitivity, precision and reproducibility by all five assays, whereas below this limit detection became unreliable and varied widely between assays, especially when input material was limited. Missed mutations (false-negatives) were more common than erroneous candidates (false-positives), indicating that the reliable sampling of rare ctDNA fragments is the key challenge for ctDNA assays. This comprehensive evaluation of the analytical performance of ctDNA assays serves to inform best-practice guidelines and provides a resource for precision oncology., Editorial summary: Reliable detection of mutations below 0.5% variant allele frequency remains a key challenge for circulating tumor DNA sequencing assays.
Published: 2021

30. An Image Analysis Environment for Species Identification of Food Contaminating Beetles.

Author: Daniel Martin, Hongjian Ding, Leihong Wu, Howard Semey, Amy Barnes, Darryl Langley, Su Inn Park, Zhichao Liu, Weida Tong, and Joshua Xu
Published: 2016
Full Text: View/download PDF

31. Relating Anatomical Therapeutic Indications by the Ensemble Similarity of Drug Sets.

Author: Leihong Wu, Ni Ai, Yufeng Liu, Yi Wang, and Xiaohui Fan
Published: 2013
Full Text: View/download PDF

32. Transcriptome analysis reveals lung-specific miRNAs associated with impaired mucociliary clearance induced by cigarette smoke in an in vitro human airway tissue model

Author: Xuefei Cao, Rui Xiong, Qiangen Wu, Hans Rosenfeldt, Levan Muskhelishvili, Ying Chen, Matthew Bryant, Sheila M Healy, Tao Chen, Yue Wu, and Leihong Wu
Subjects: 0301 basic medicine, Mucociliary clearance, Health, Toxicology and Mutagenesis, Down-Regulation, Bronchi, 010501 environmental sciences, Toxicology, 01 natural sciences, Cigarette Smoking, Andrology, Transcriptome, Pulmonary Disease, Chronic Obstructive, 03 medical and health sciences, Downregulation and upregulation, Smoke, Ciliogenesis, Tobacco, Humans, Medicine, Cilia, Respiratory system, Lung, Cells, Cultured, 0105 earth and related environmental sciences, COPD, business.industry, Gene Expression Profiling, Cilium, Smoking, Epithelial Cells, Tobacco Products, General Medicine, medicine.disease, MicroRNAs, 030104 developmental biology, medicine.anatomical_structure, Mucociliary Clearance, business
Abstract: Exposure to cigarette smoke (CS) is strongly associated with impaired mucociliary clearance (MCC), which has been implicated in the pathogenesis of CS-induced respiratory diseases, such as chronic obstructive pulmonary diseases (COPD). In this study, we aimed to identify microRNAs (miRNAs) that are associated with impaired MCC caused by CS in an in vitro human air-liquid-interface (ALI) airway tissue model. ALI cultures were exposed to CS (diluted with 0.5 L/min, 1.0 L/min, and 4.0 L/min of clean air) from smoking five 3R4F University of Kentucky reference cigarettes under the International Organization for Standardization (ISO) machine smoking regimen, every other day for 1 week (a total of 3 days, 40 min/day). Transcriptome analyses of ALI cultures exposed to the high concentration of CS identified 5090 differentially expressed genes and 551 differentially expressed miRNAs after the third exposure. Genes involved in ciliary function and ciliogenesis were significantly perturbed by repeated CS exposures, leading to changes in cilia beating frequency and ciliary protein expression. In particular, a time-dependent decrease in the expression of miR-449a, a conserved miRNA highly enriched in ciliated airway epithelia and implicated in motile ciliogenesis, was observed in CS-exposed cultures. Similar alterations in miR-449a have been reported in smokers with COPD. Network analysis further indicates that downregulation of miR-449a by CS may derepress cell-cycle proteins, which, in turn, interferes with ciliogenesis. Investigating the effects of CS on transcriptome profile in human ALI cultures may provide not only mechanistic insights, but potential early biomarkers for CS exposure and harm.
Published: 2021
Full Text: View/download PDF

33. Consensus Ranking Approach to Understanding the Underlying Mechanism With QSAR.

Author: Li Shao, Leihong Wu, Xiaohui Fan, and Yiyu Cheng
Published: 2010
Full Text: View/download PDF

34. Coordinated Regulation of UGT2B15 Expression by Long Noncoding RNA LINC00574 and hsa-miR-129-5p in HepaRG Cells

Author: Lin Xu, Yanjie Zhao, Jing Chen, Lei Guo, Si Chen, Leihong Wu, Dianke Yu, Baitang Ning, Yuan Jin, William H. Tolleson, Weida Tong, Xubing Wang, Bridgett Knox, Jiao Luo, and Dongying Li
Subjects: Pharmacology, Messenger RNA, Small interfering RNA, Chemistry, Regulator, Pharmaceutical Science, RNA-binding protein, Hep G2 Cells, 030226 pharmacology & pharmacy, Gene Expression Regulation, Enzymologic, Long non-coding RNA, Acetaminophen, Cell biology, MicroRNAs, 03 medical and health sciences, 0302 clinical medicine, 030220 oncology & carcinogenesis, microRNA, medicine, Hepatic stellate cell, Humans, RNA, Long Noncoding, Glucuronosyltransferase, Erratum, medicine.drug
Abstract: Recent studies have shown that microRNAs and long noncoding RNAs (lncRNAs) regulate the expression of drug metabolizing enzymes (DMEs) in human hepatic cells and that a set of DMEs, including UDP glucuronosyltransferase (UGT) 2B15, is down-regulated dramatically in liver cells by toxic acetaminophen (APAP) concentrations. In this study we analyzed mRNA, microRNA, and lncRNA expression profiles in APAP-treated HepaRG cells to explore noncoding RNA-dependent regulation of DME expression. The expression of UGT2B15 and lncRNA LINC00574 was decreased in APAP-treated HepaRG cells. UGT2B15 levels were diminished by LINC00574 suppression using antisense oligonucleotides or small interfering RNA. Furthermore, we found that hsa-miR-129-5p suppressed LINC00574 and decreased UGT2B15 expression via LINC00574 in HepaRG cells. In conclusion, our results indicate that LINC00574 acts as an important regulator of UGT2B15 expression in human hepatic cells, providing experimental evidence and new clues to understand the role of cross-talk between noncoding RNAs. SIGNIFICANCE STATEMENT: We showed a molecular network that displays the cross-talk and consequences among mRNA, micro RNA, long noncoding RNA, and proteins in acetaminophen (APAP)-treated HepaRG cells. APAP treatment increased the level of hsa-miR-129-5p and decreased that of LINC00574, ultimately decreasing the production of UDP glucuronosyltransferase (UGT) 2B15. The proposed regulatory network suppresses UGT2B15 expression through interaction of hsa-miR-129-5p and LINC00574, which may be achieved potentially by recruiting RNA binding proteins.
Published: 2020
Full Text: View/download PDF

35. Drug-induced liver injury severity and toxicity (DILIst): binary classification of 1279 drugs by human hepatotoxicity

Author: Weida Tong, Shraddha Thakkar, Zhichao Liu, Ting Li, Ruth Roberts, and Leihong Wu
Subjects: 0301 basic medicine, Drug, media_common.quotation_subject, Bioinformatics, Severity of Illness Index, Food and drug administration, 03 medical and health sciences, 0302 clinical medicine, Drug Discovery, Severity of illness, medicine, Humans, Distribution (pharmacology), media_common, Pharmacology, Liver injury, Alternative methods, business.industry, medicine.disease, 030104 developmental biology, Pharmaceutical Preparations, Drug development, 030220 oncology & carcinogenesis, Toxicity, Chemical and Drug Induced Liver Injury, business
Abstract: Drug-induced liver injury (DILI) is of significant concern to drug development and regulatory review because of the limited success with existing preclinical models. For developing alternative methods, a large drug list is needed with known DILI severity and toxicity. We augmented the DILIrank data set [annotated using US Food and Drug Administration (FDA) drug labeling)] with four literature datasets (N >350 drugs) to generate the largest drug list with DILI classification, called DILIst (DILI severity and toxicity). DILIst comprises 1279 drugs, of which 768 were DILI positives (increase of 65% from DILIrank), whereas 511 were DILI negatives (increase of 65%). The investigation of DILI positive-negative distribution across various therapeutic categories revealed the most and least frequent DILI categories. Thus, we consider DILIst to be an invaluable resource for the community to improve DILI research.
Published: 2020
Full Text: View/download PDF

36. Development of benchmark datasets for text mining and sentiment analysis to accelerate regulatory literature review

Author: Leihong, Wu, Si, Chen, Lei, Guo, Svitlana, Shpyleva, Kelly, Harris, Tariq, Fahmi, Timothy, Flanigan, Weida, Tong, Joshua, Xu, and Zhen, Ren
Subjects: General Medicine, Toxicology
Abstract: In the field of regulatory science, reviewing literature is an essential and important step, which most of the time is conducted by manually reading hundreds of articles. Although this process is highly time-consuming and labor-intensive, most output of this process is not well transformed into machine-readable format. The limited availability of data has largely constrained the artificial intelligence (AI) system development to facilitate this literature reviewing in the regulatory process. In the past decade, AI has revolutionized the area of text mining as many deep learning approaches have been developed to search, annotate, and classify relevant documents. After the great advancement of AI algorithms, a lack of high-quality data instead of the algorithms has recently become the bottleneck of AI system development. Herein, we constructed two large benchmark datasets, Chlorine Efficacy dataset (CHE) and Chlorine Safety dataset (CHS), under a regulatory scenario that sought to assess the antiseptic efficacy and toxicity of chlorine. For each dataset, ∼10,000 scientific articles were initially collected, manually reviewed, and their relevance to the review task were labeled. To ensure high data quality, each paper was labeled by a consensus among multiple experienced reviewers. The overall relevance rate was 27.21% (2,663 of 9,788) for CHE and 7.50% (761 of 10,153) for CHS, respectively. Furthermore, the relevant articles were categorized into five subgroups based on the focus of their content. Next, we developed an attention-based classification language model using these two datasets. The proposed classification model yielded 0.857 and 0.908 of Area Under the Curve (AUC) for CHE and CHS dataset, respectively. This performance was significantly better than permutation test (p 10E-9), demonstrating that the labeling processes were valid. To conclude, our datasets can be used as benchmark to develop AI systems, which can further facilitate the literature review process in regulatory science.
Published: 2023
Full Text: View/download PDF

37. Fabry’s disease and stroke: Effectiveness of enzyme replacement therapy (ERT) in stroke prevention, a review with meta-analysis

Author: Sen Sheng, Nidhi Kapoor, Sanjeeva Onteddu, Aliza T. Brown, Rohan Sharma, Saritha Ranabothu, Krishna Nalleballe, and Leihong Wu
Subjects: Male, congenital, hereditary, and neonatal diseases and abnormalities, medicine.medical_specialty, Disease, 03 medical and health sciences, 0302 clinical medicine, Physiology (medical), Internal medicine, medicine, Humans, Enzyme Replacement Therapy, Stroke, business.industry, nutritional and metabolic diseases, General Medicine, Enzyme replacement therapy, Middle Aged, medicine.disease, Fabry's disease, Confidence interval, Treatment Outcome, Neurology, Ischemic Attack, Transient, alpha-Galactosidase, 030220 oncology & carcinogenesis, Meta-analysis, Stroke prevention, Fabry Disease, Female, Surgery, Neurology (clinical), business, 030217 neurology & neurosurgery, Cohort study
Abstract: Back ground and objective Fabry’s disease, is the most prevalent lysosomal storage disorder and is notorious for its early multi-organ involvement leading to complications, including ischemic strokes and transient ischemic attacks. Since 2001, enzyme replacement therapy (ERT) has become the mainstay treatment for Fabry’s patients but the indications are not clearly defined. We did a meta-analysis of the available data to review the benefit of ERT for stroke prevention in Fabry’s patients. Methods A literature search was performed from National Center for Biotechnology information (NCBI)/PubMed database without restriction of years for systematic review purposes. A systematic review of clinical cohort studies and trials was performed with pooled analysis of proportions. The pooled proportions and the confidence intervals (CI) for stroke recurrence ratio were calculated for both ERT treatment group and native treatment groups. Result A total of 7 cohort studies and 2 RCTs involving 7513 participants (1471 on ERT vs 6042 on native treatment) met inclusion criteria. The pooled proportions analysis showed that the stroke recurrence ratio in the ERT treatment group was 8.2% [95% CI 0.038, 0.126] and in native-treatment group was 16% [95% CI; 0.102, 0.217]. Effect differences favored ERT treatment group over native treatment group (p = 0.03). Conclusion Our meta-analysis based on the currently available data showed that ERT for Fabry’s disease has beneficial effect on stroke prevention. Female carriers and atypically affected males could be started on ERT as soon as diagnosis is made. Further studies are warranted to support the role of ERT in stroke prevention.
Published: 2019
Full Text: View/download PDF

38. Study of serious adverse drug reactions using FDA-approved drug labeling and MedDRA

Author: Darshan Mehta, Taylor Ingle, Guangxu Zhou, Zhichao Liu, Junshuang Yang, Anna Zhao-Wong, Joshua Xu, Shraddha Thakkar, Stephen C. Harris, Weida Tong, Hong Fang, Weigong Ge, and Leihong Wu
Subjects: Drug, medicine.medical_specialty, Standard terminology, Drug-Related Side Effects and Adverse Reactions, media_common.quotation_subject, MedDRA, Boxed warning, Adverse drug reactions, lcsh:Computer applications to medicine. Medical informatics, Biochemistry, Approved drug, 03 medical and health sciences, 0302 clinical medicine, Structural Biology, Internal medicine, medicine, Adverse Drug Reaction Reporting Systems, Humans, Drug reaction, Drug safety, Molecular Biology, Data mining, lcsh:QH301-705.5, 030304 developmental biology, New drug application, media_common, 0303 health sciences, business.industry, Boxed Warning, Applied Mathematics, Research, Drug labeling, medicine.disease, Computer Science Applications, Substance abuse, Drug class, lcsh:Biology (General), 030220 oncology & carcinogenesis, lcsh:R858-859.7, business, Structured product labeling
Abstract: Background Adverse Drug Reactions (ADRs) are of great public health concern. FDA-approved drug labeling summarizes ADRs of a drug product mainly in three sections, i.e., Boxed Warning (BW), Warnings and Precautions (WP), and Adverse Reactions (AR), where the severity of ADRs are intended to decrease in the order of BW > WP > AR. Several reported studies have extracted ADRs from labeling documents, but most, if not all, did not discriminate the severity of the ADRs by the different labeling sections. Such a practice could overstate or underestimate the impact of certain ADRs to the public health. In this study, we applied the Medical Dictionary for Regulatory Activities (MedDRA) to drug labeling and systematically analyzed and compared the ADRs from the three labeling sections with a specific emphasis on analyzing serious ADRs presented in BW, which is of most drug safety concern. Results This study investigated New Drug Application (NDA) labeling documents for 1164 single-ingredient drugs using Oracle Text search to extract MedDRA terms. We found that only a small portion of MedDRA Preferred Terms (PTs), 3819 out of 21,920 or 17.42%, were observed in a whole set of documents. In detail, 466/3819 (12.0%) PTs were in BW, 2023/3819 (53.0%) were in WP, and 2961/3819 (77.5%) were in AR sections. We also found a higher overlap of top 20 occurring BW PTs with WP sections compared to AR sections. Within the MedDRA System Organ Class levels, serious ADRs (sADRs) from BW were prevalent in Nervous System disorders and Vascular disorders. A Hierarchical Cluster Analysis (HCA) revealed that drugs within the same therapeutic category shared the same ADR patterns in BW (e.g., nervous system drug class is highly associated with drug abuse terms such as dependence, substance abuse, and respiratory depression). Conclusions This study demonstrated that combining MedDRA standard terminologies with data mining techniques facilitated computer-aided ADR analysis of drug labeling. We also highlighted the importance of labeling sections that differ in seriousness and application in drug safety. Using sADRs primarily related to BW sections, we illustrated a prototype approach for computer-aided ADR monitoring and studies which can be applied to other public health documents. Electronic supplementary material The online version of this article (10.1186/s12859-019-2628-5) contains supplementary material, which is available to authorized users.
Published: 2019
Full Text: View/download PDF

39. A verified genomic reference sample for assessing performance of cancer panels detecting small variants of low allele frequency

Author: Margherita Francescatto, Fujun Qiu, Jonathan Foox, Cesare Furlanello, Halil Bisgin, Daniel J. Craig, Chia Jung Chang, Kristina Giorda, Tao Chen, Sayed Mohammad Ebrahim Sahraeian, Yulong Li, Simon Cawley, Ying Yu, Zhihong Zhang, Yun-Ching Chen, Zhiguang Li, Dan Li, Vinay K. Mittal, Raymond Miller, Wendell D. Jones, Jianying Li, Marghoob Mohiyuddin, Zhining Wen, Rebecca Kusko, Gunjan Hariani, Yuanting Zheng, James C. Willey, Chen Suo, Todd Richmond, Wenzhong Xiao, Lee Scott Basehore, David P. Kreil, Dong Wang, Yutao Fu, Nikola Tom, Yifan Zhang, Zhichao Liu, Andreas Scherer, Carlos Pabón-Peña, Kira P. Grist, Meijian Guan, Giuseppe Jurman, Leihong Wu, Chang Xu, Katherine Wilkins, Jiyang Zhang, Anne Bergstrom Lucas, Barbara L. Parsons, Mehdi Pirooznia, Daniel Butler, Paweł P. Łabaj, Scott Happe, Marco Chierici, K. Miclaus, Suzy M. Stiegelmeyer, Daniel Burgess, Nathan Haseley, Kevin Lai, Weida Tong, Quan Zhen Li, Pierre R. Bushel, Donald J. Johann, Angela del Pozo, Yingyi Hao, Binsheng Gong, Guangchun Chen, Christopher E. Mason, Natalia Novoradovskaya, Joshua Xu, Tieliu Shi, Mario Solís López, Wenjun Bao, Leming Shi, J. Jasper, and Institute for Molecular Medicine Finland
Subjects: DNA Copy Number Variations, QH301-705.5, DATABASE, Concordance, 3122 Cancers, EXOME, Sample (statistics), Computational biology, Biology, QH426-470, Workflow, 03 medical and health sciences, Genetic Heterogeneity, 0302 clinical medicine, Gene Frequency, QUALITY-CONTROL, Cell Line, Tumor, Neoplasms, COPY NUMBER VARIATIONS, INDEL DETECTION, Genetics, Biomarkers, Tumor, Humans, Digital polymerase chain reaction, Copy-number variation, Genetic Testing, Liquid biopsy, Biology (General), Allele frequency, Exome, Alleles, 030304 developmental biology, 0303 health sciences, business.industry, Research, 1184 Genetics, developmental biology, physiology, Genetic Variation, Genomics, SOMATIC MUTATIONS, FRAMEWORK, 3. Good health, READ ALIGNMENT, 030220 oncology & carcinogenesis, DISCOVERY, Personalized medicine, ACCURATE, 3111 Biomedicine, business
Abstract: Background Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance. Results In reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5–100× more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels. Conclusion These new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.
Published: 2021

40. Optimized imaging methods for species-level identification of food-contaminating beetles

Author: Zhichao Liu, Himansu J. Vyas, Joshua Xu, Howard G. Semey, Weida Tong, Tanmay Bera, Hongjian Ding, Leihong Wu, and Amy Barnes
Subjects: 0301 basic medicine, Science, media_common.quotation_subject, Pattern analysis, Food Contamination, 02 engineering and technology, Insect, Biology, Article, Pattern Recognition, Automated, 03 medical and health sciences, Imaging, Three-Dimensional, Species Specificity, Species level, Animals, Species identification, media_common, Multidisciplinary, Ecology, Biological techniques, Automated species identification, 021001 nanoscience & nanotechnology, Multiple species, Coleoptera, 030104 developmental biology, Evolutionary biology, Medicine, Identification (biology), 0210 nano-technology, Contaminated food
Abstract: Identifying the exact species of pantry beetle responsible for food contamination, is imperative in assessing the risks associated with contamination scenarios. Each beetle species is known to have unique patterns on their hardened forewings (known as elytra) through which they can be identified. Currently, this is done through manual microanalysis of the insect or their fragments in contaminated food samples. We envision that the use of automated pattern analysis would expedite and scale up the identification process. However, such automation would require images to be captured in a consistent manner, thereby enabling the creation of large repositories of high-quality images. Presently, there is no standard imaging technique for capturing images of beetle elytra, which consequently means, there is no standard method of beetle species identification through elytral pattern analysis. This deficiency inspired us to optimize and standardize imaging methods, especially for food-contaminating beetles. For this endeavor, we chose multiple species of beetles belonging to different families or genera that have near-identical elytral patterns, and thus are difficult to identify correctly at the species level. Our optimized imaging method provides enhanced images such that the elytral patterns between individual species could easily be distinguished from each other, through visual observation. We believe such standardization is critical in developing automated species identification of pantry beetles and/or other insects. This eventually may lead to improved taxonomical classification, allowing for better management of food contamination and ecological conservation.
Published: 2021
Full Text: View/download PDF

41. T2D@ZJU: a knowledgebase integrating heterogeneous connections associated with type 2 diabetes mellitus.

Author: Zhenzhong Yang, Jihong Yang, Wei Liu, Leihong Wu, Li Xing, Yi Wang, Xiaohui Fan, and Yiyu Cheng
Published: 2013
Full Text: View/download PDF

42. CHD@ZJU: a knowledgebase providing network-based research platform on coronary heart disease.

Author: Leihong Wu, Xiang Li, Jihong Yang, Yufeng Liu, Xiaohui Fan, and Yiyu Cheng
Published: 2013
Full Text: View/download PDF

43. Trade-off Predictivity and Explainability for Machine-Learning Powered Predictive Toxicology: An in-Depth Investigation with Tox21 Data Sets

Author: Weida Tong, Leihong Wu, Ruili Huang, Joshua Xu, Igor V. Tetko, and Zhonghua Xia
Subjects: Models, Molecular, Databases, Factual, Computer science, media_common.quotation_subject, Quantitative Structure-Activity Relationship, 010501 environmental sciences, Toxicology, Machine learning, computer.software_genre, 01 natural sciences, Article, Machine Learning, 03 medical and health sciences, Linear regression, Feature (machine learning), Range (statistics), Humans, 030304 developmental biology, 0105 earth and related environmental sciences, media_common, 0303 health sciences, Variables, business.industry, Deep learning, General Medicine, Random forest, Data set, Support vector machine, Pharmaceutical Preparations, Artificial intelligence, Chemical and Drug Induced Liver Injury, business, computer
Abstract: Selecting a model in predictive toxicology often involves a trade-off between prediction performance and explainability: should we sacrifice the model performance to gain explainability, or vice versa? Here we present a comprehensive study to assess algorithm and feature influences on model performance in chemical toxicity research. We conducted over 5000 models for a Tox21 bioassay dataset of 65 assays and ~7600 compounds. Seven molecular representations as features and twelve modeling approaches varying in complexity and explainability were employed to systematically investigate the impact of various factors on model performance and explainability. We demonstrated that endpoints dictated a model’s performance, regardless of the chosen modeling approach including deep learning and chemical features. Overall, more complex models such as (LS-)SVM and Random Forest performed marginally better than simpler models such as linear regression and KNN in the presented Tox21 data analysis. Since a simpler model with acceptable performance often also is easy to interpret for the Tox21 dataset, it clearly was the preferred choice due to its better explainability. Given that each dataset had its own error structure both for dependent and independent variables, we strongly recommend that it is important to conduct a systematic study with a broad range of model complexity and feature explainability to identify model balancing its predictivity and explainability.
Published: 2021

44. Advancing Quality-Control for NGS Measurement of Actionable Mutations in Circulating Tumor DNA

Author: Thomas Morrison, Tim R. Mercer, Nathan Haseley, Leihong Wu, Jennifer S. LoCoco, Rebecca Kusko, Quan Zhen Li, Wendell D. Jones, Daniel J. Craig, Natalia Novoradovskaya, Joshua Xu, Todd Richmond, Ira W. Deveson, Thomas M. Blomquist, Erin L. Crawford, Aminah Wali, James C. Willey, Brad Austermiller, Donald J. Johann, and Guangchun Chen
Subjects: Mutation, Current practice, Computer science, Circulating tumor DNA, medicine, Variant allele, Computational biology, Liquid biopsy, medicine.disease_cause, Dna testing, Precision medicine, DNA sequencing
Abstract: SUMMARYThe primary objective of the FDA-led Sequencing and Quality Control Phase 2 (SEQC2) project is to develop standard analysis protocols and quality control metrics for use in DNA testing to enhance scientific research and precision medicine. This study reports a targeted next generation sequencing (NGS) method that enables more accurate detection of actionable mutations in circulating tumor DNA (ctDNA) clinical specimens. This advancement was enabled by designing a synthetic internal standard spike-in for each actionable mutation target, suitable for use in NGS following hybrid-capture enrichment and unique molecular index (UMI) or non-UMI library preparation. When mixed with contrived ctDNA reference samples, internal standards enabled calculation of technical error rate, limit of blank, and limit of detection for each variant at each nucleotide position, in each sample. True positive mutations with variant allele fraction too low for detection by current practice were detected with this method, thereby increasing sensitivity.
Published: 2021
Full Text: View/download PDF

45. Systematic Identification of Molecular Targets and Pathways Related to Human Organ Level Toxicity

Author: Ruili Huang, Tuan Xu, Leihong Wu, Menghang Xia, and Anton Simeonov
Subjects: 0303 health sciences, In vitro toxicology, Neurotoxicity, Developmental toxicity, General Medicine, Computational biology, 010501 environmental sciences, Biology, Toxicology, medicine.disease, 01 natural sciences, Article, Nephrotoxicity, Biological pathway, Machine Learning, 03 medical and health sciences, Adverse Outcome Pathway, Toxicity, Toxicity Tests, medicine, Humans, Environmental Pollutants, Reproductive toxicity, 030304 developmental biology, 0105 earth and related environmental sciences
Abstract: The mechanisms leading to organ level toxicities are poorly understood. In this study, we applied an integrated approach to deduce the molecular targets and biological pathways involved in chemically induced toxicity for eight common human organ level toxicity end points (carcinogenicity, cardiotoxicity, developmental toxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, reproductive toxicity, and skin toxicity). Integrated analysis of in vitro assay data, molecular targets and pathway annotations from the literature, and toxicity-molecular target associations derived from text mining, combined with machine learning techniques, were used to generate molecular targets for each of the organ level toxicity end points. A total of 1516 toxicity-related genes were identified and subsequently analyzed for biological pathway coverage, resulting in 206 significant pathways (p-value
Published: 2020

46. Comparing SVM and ANN based Machine Learning Methods for Species Identification of Food Contaminating Beetles

Author: Weida Tong, Darryl A. Langley, Himansu J. Vyas, Joshua Xu, Amy Barnes, Howard G. Semey, Hongjian Ding, Zhichao Liu, Leihong Wu, Halil Bisgin, Monica Pava-Ripoll, and Tanmay Bera
Subjects: 0301 basic medicine, Food Safety, Support Vector Machine, Computer science, lcsh:Medicine, Context (language use), Food Contamination, Machine learning, computer.software_genre, Article, Machine Learning, 03 medical and health sciences, Artificial Intelligence, Species identification, Animals, lcsh:Science, Multidisciplinary, Artificial neural network, business.industry, lcsh:R, Food safety, Support vector machine, Coleoptera, Identification (information), 030104 developmental biology, Pattern recognition (psychology), lcsh:Q, Artificial intelligence, Neural Networks, Computer, business, computer, Algorithms
Abstract: Insect pests, such as pantry beetles, are often associated with food contaminations and public health risks. Machine learning has the potential to provide a more accurate and efficient solution in detecting their presence in food products, which is currently done manually. In our previous research, we demonstrated such feasibility where Artificial Neural Network (ANN) based pattern recognition techniques could be implemented for species identification in the context of food safety. In this study, we present a Support Vector Machine (SVM) model which improved the average accuracy up to 85%. Contrary to this, the ANN method yielded ~80% accuracy after extensive parameter optimization. Both methods showed excellent genus level identification, but SVM showed slightly better accuracy for most species. Highly accurate species level identification remains a challenge, especially in distinguishing between species from the same genus which may require improvements in both imaging and machine learning techniques. In summary, our work does illustrate a new SVM based technique and provides a good comparison with the ANN model in our context. We believe such insights will pave better way forward for the application of machine learning towards species identification and food safety.
Published: 2018
Full Text: View/download PDF

47. Competitive docking model for prediction of the human nicotinic acetylcholine receptor α7 binding of tobacco constituents

Author: Bohu Pan, Hao Ye, Huixiao Hong, Weida Tong, Hui Wen Ng, Carmine Leggett, Leihong Wu, Chandrabose Selvaraj, and Sugunadevi Sakkiah
Subjects: 0301 basic medicine, tobacco constituents, media_common.quotation_subject, In silico, Computational biology, Biology, Nicotine, 03 medical and health sciences, 0302 clinical medicine, mental disorders, medicine, Regulatory science, nicotinic acetylcholine receptor, Acetylcholine receptor, media_common, Addiction, prediction, molecular docking, Nicotinic acetylcholine receptor, 030104 developmental biology, Nicotinic agonist, Oncology, Docking (molecular), addiction, 030217 neurology & neurosurgery, Research Paper, medicine.drug
Abstract: // Hui Wen Ng 1, * , Carmine Leggett 2, * , Sugunadevi Sakkiah 1, * , Bohu Pan 1 , Hao Ye 1 , Leihong Wu 1 , Chandrabose Selvaraj 1 , Weida Tong 1 and Huixiao Hong 1 1 Division of Bioinformatics and Biostatistics, Office of Research, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA 2 Division of Non-clinical Science, Office of Science, Center for Tobacco Products, U.S. Food and Drug Administration, Silver Spring, MD 20993, USA * These authors made equal contributions to the work Correspondence to: Huixiao Hong, email: huixiao.hong@fda.hhs.gov Keywords: prediction; tobacco constituents; addiction; molecular docking; nicotinic acetylcholine receptor Received: August 28, 2017 Accepted: February 01, 2018 Epub: February 08, 2018 Published: March 30, 2018 ABSTRACT The detrimental health effects associated with tobacco use constitute a major public health concern. The addiction associated with nicotine found in tobacco products has led to difficulty in quitting among users. Nicotinic acetylcholine receptors (nAChRs) are the targets of nicotine and are responsible for addiction to tobacco products. However, it is unknown if the other >8000 tobacco constituents are addictive. Since it is time-consuming and costly to experimentally assess addictive potential of such larger number of chemicals, computationally predicting human nAChRs binding is important for in silico evaluation of addiction potential of tobacco constituents and needs structures of human nAChRs. Therefore, we constructed three-dimensional structures of the ligand binding domain of human nAChR α7 subtype and then developed a predictive model based on the constructed structures to predict human nAChR α7 binding activity of tobacco constituents. The predictive model correctly predicted 11 out of 12 test compounds to be binders of nAChR α7. The model is a useful tool for high-throughput screening of potential addictive tobacco constituents. These results could inform regulatory science research by providing a new validated predictive tool using cutting-edge computational methodology to high-throughput screen tobacco additives and constituents for their binding interaction with the human α7 nicotinic receptor. The tool represents a prediction model capable of screening thousands of chemicals found in tobacco products for addiction potential, which improves the understanding of the potential effects of additives.
Published: 2018
Full Text: View/download PDF

48. Advancing NGS quality control to enable measurement of actionable mutations in circulating tumor DNA

Author: Donald J. Johann, Erin L. Crawford, Thomas M. Blomquist, Bradley Austermiller, Natalia Novoradovskaya, Joshua Xu, Guangchun Chen, Nathan Haseley, Wendell D. Jones, Tim R. Mercer, Ira W. Deveson, Jennifer S. LoCoco, Aminah Wali, Quan Zhen Li, Leihong Wu, Rebecca Kusko, Todd Richmond, Daniel J. Craig, Tom Morrison, and James C. Willey
Subjects: Mutation, Computer science, Library preparation, Hybrid capture, Computational biology, Variant allele, medicine.disease_cause, Precision medicine, Biochemistry, Genetics and Molecular Biology (miscellaneous), Biochemistry, Computer Science Applications, Current practice, Circulating tumor DNA, Genetics, medicine, Radiology, Nuclear Medicine and imaging, Liquid biopsy, Biotechnology
Abstract: Summary The primary objective of the FDA-led Sequencing and Quality Control Phase 2 (SEQC2) project is to develop standard analysis protocols and quality control metrics for use in DNA testing to enhance scientific research and precision medicine. This study reports a targeted next-generation sequencing (NGS) method that will enable more accurate detection of actionable mutations in circulating tumor DNA (ctDNA) clinical specimens. To accomplish this, a synthetic internal standard spike-in was designed for each actionable mutation target, suitable for use in NGS following hybrid capture enrichment and unique molecular index (UMI) or non-UMI library preparation. When mixed with contrived ctDNA reference samples, internal standards enabled calculation of technical error rate, limit of blank, and limit of detection for each variant at each nucleotide position in each sample. True-positive mutations with variant allele fraction too low for detection by current practice were detected with this method, thereby increasing sensitivity.
Published: 2021
Full Text: View/download PDF

49. Multiple microRNAs function as self-protective modules in acetaminophen-induced hepatotoxicity in humans

Author: Kostiantyn Dreval, William B. Mattes, Yongli Guo, Linjuan Zeng, Yaqiong Jin, Weida Tong, Huixiao Hong, Sudeepa Bhattacharyya, Hong Fang, Bridgett Knox, Tieliu Shi, Leming Shi, Yong Wang, Wenming Xiao, Zhen Ren, Igor P. Pogribny, Leihong Wu, Volodymyr Tryndyak, Jinchun Sun, Si Chen, Baitang Ning, Pritmohinder S. Gill, Richard D. Beger, Yinting Chen, Xi Yang, William H. Tolleson, Sandra S. McCullough, Laura K. Schnackenberg, Lei Guo, Nan Mei, Dianke Yu, and Laura P. James
Subjects: Male, 0301 basic medicine, Health, Toxicology and Mutagenesis, In silico, Biology, Pharmacology, Transfection, Toxicology, Article, Cell Line, 03 medical and health sciences, microRNA, medicine, Humans, Child, Gene, Acetaminophen, Liver injury, Messenger RNA, digestive, oral, and skin physiology, HEK 293 cells, General Medicine, medicine.disease, body regions, MicroRNAs, HEK293 Cells, 030104 developmental biology, embryonic structures, Hepatocytes, Female, Chemical and Drug Induced Liver Injury, Drug Overdose, medicine.drug
Abstract: Acetaminophen (APAP) overdose is the leading cause of acute liver failure. Yet the mechanisms underlying adaptive tolerance toward APAP-induced liver injury are not fully understood. To better understand molecular mechanisms contributing to adaptive tolerance to APAP is an underpinning foundation for APAP-related precision medicine. In the current study, the mRNA and microRNA (miRNA) expression profiles derived from next generation sequencing data for APAP-treated (5 and 10 mM) Hep-aRG cells and controls were analyzed systematically. Putative miRNAs targeting key dysregulated genes involved in APAP hepatotoxicity were selected using in silico prediction algorithms, un-biased gene ontology, and network analyses. Luciferase reporter assays, RNA electrophoresis mobility shift assays, and miRNA pull-down assays were performed to investigate the role of miRNAs affecting the expression of dysregulated genes. Levels of selected miRNAs were measured in serum samples obtained from children with APAP overdose (58.6–559.4 mg/kg) and from healthy controls. As results, 2758 differentially expressed genes and 47 miRNAs were identified. Four of these miRNAs (hsa-miR-224-5p, hsa-miR-320a, hsa-miR-449a, and hsa-miR-877-5p) suppressed drug metabolizing enzyme (DME) levels involved in APAP-induced liver injury by downregulating HNF1A, HNF4A and NR1I2 expression. Exogenous transfection of these miRNAs into HepaRG cells effectively rescued them from APAP toxicity, as indicated by decreased alanine aminotransferase levels. Importantly, hsa-miR-320a and hsa-miR-877-5p levels were significantly elevated in serum samples obtained from children with APAP overdose compared to health controls. Collectively, these data indicate that hsa-miR-224-5p, hsa-miR-320a, hsa-miR-449a, and hsa-miR-877-5p suppress DME expression involved in APAP-induced hepatotoxicity and they contribute to an adaptive response in hepatocytes.
Published: 2017
Full Text: View/download PDF

50. Integrating Drug’s Mode of Action into Quantitative Structure–Activity Relationships for Improved Prediction of Drug-Induced Liver Injury

Author: Kristin McEuen, Zhichao Liu, Minjun Chen, Weida Tong, Leihong Wu, Scott M. Auerbach, Joshua Xu, Hong Fang, and Ruili Huang
Subjects: Models, Molecular, 0301 basic medicine, Drug, Quantitative structure–activity relationship, Computer science, General Chemical Engineering, media_common.quotation_subject, Molecular Conformation, Quantitative Structure-Activity Relationship, Computational biology, Library and Information Sciences, Pharmacology, Article, Molecular conformation, 03 medical and health sciences, Mode of action, media_common, Mechanism (biology), Computational Biology, Quantitative structure, General Chemistry, Computer Science Applications, 030104 developmental biology, Pharmaceutical Preparations, Test set, Chemical and Drug Induced Liver Injury
Abstract: Drug-induced liver injury (DILI) is complex in mechanism. Different drugs could undergo different mechanisms but result in the same DILI type while the same drug could lead to different DILI types via different mechanisms. Therefore, predicting a drug’s potential for DILI should take its underlying mechanisms into consideration. To achieve that, we constructed a novel approach by incorporating drug’s Mode of Action (MOA) into Quantitative Structure-Activity Relationship (QSAR) modeling. This MOA-DILI approach was examined using a dataset of 333 drugs. The drugs were first grouped according to their MOA profiles (positive or negative in each MOA) based on the Tox21 qHTS assays. QSAR models for individual MOA assays were developed and subsequently combined to obtain the MOA-DILI model. A hold-out testing strategy (222 drugs for training and 111 drugs as a test set) was employed, which yielded the predictive accuracy of 0.711. For comparison, the MOA-DILI model was directly compared with the standard QSAR approach using the same hold-out strategy, and the QSAR model yielded an accuracy of 0.662. To minimize the random chance in splitting training/test sets, the hold-out testing process was repeated 1000 times, and the observed difference in prediction accuracy between MOA-DILI and QSARs was statistically significant (P-value
Published: 2017
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

120 results on '"Leihong Wu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources