1,548,259 results on '"software"'
Search Results
2. Modelling, solution and application of optimization techniques in HRES: From conventional to artificial intelligence
- Author
-
Saxena, Vivek, Kumar, Narendra, Manna, Saibal, Rajput, Saurabh Kumar, Agarwal, Kusum Lata, Diwania, Sourav, and Gupta, Varun
- Published
- 2025
- Full Text
- View/download PDF
3. A novel algorithm and software for 3D density gravity inversion
- Author
-
Chen, Wenjin, Tan, Xiaolong, and Liu, Yang
- Published
- 2025
- Full Text
- View/download PDF
4. A more flexible design for MDSplus Device drivers
- Author
-
Santoro, Fernando, Lane-Walsh, Stephen, Stillerman, Joshua, and Winkel, Mark
- Published
- 2025
- Full Text
- View/download PDF
5. Multi-Institutional Evaluation and Training of Breast Density Classification AI Algorithm Using ACR Connect and AI-LAB
- Author
-
Brink, Laura, Romero, Ricardo Amaya, Coombs, Laura, Tilkin, Mike, Mazaheri, Sina, Gichoya, Judy, Zaiman, Zachary, Trivedi, Hari, Medina, Adam, Bizzo, Bernardo C., Chang, Ken, Kalpathy-Cramer, Jayashree, Kalra, Mannudeep K., Astuto, Bruno, Ramirez, Carolina, Majumdar, Sharmila, Lee, Amie Y., Lee, Christoph I., Cross, Nathan M., Chen, Po-Hao, Ciancibello, Michael, Chiunda, Allan, Nachand, Douglas, Shah, Chintan, and Wald, Christoph
- Published
- 2025
- Full Text
- View/download PDF
6. PathwayPilot: A User-Friendly Tool for Visualizing and Navigating Metabolic Pathways
- Author
-
Moortele, Tibo Vande, Verschaffelt, Pieter, Huang, Qingyao, Doncheva, Nadezhda T., Holstein, Tanja, Jachmann, Caroline, Dawyndt, Peter, Martens, Lennart, Mesuere, Bart, and Van Den Bossche, Tim
- Published
- 2025
- Full Text
- View/download PDF
7. Python control of a high-resolution near-infrared spectrometer for undergraduate use
- Author
-
Heuvel-Horwitz, Joshua, Gross, Eisen C., and Sears, Trevor J.
- Published
- 2025
- Full Text
- View/download PDF
8. Modelling and optimizing secondary metabolites production in Spirodela polyrhiza using machine learning
- Author
-
Tan, Win Hung, Tong, C.Y., Chua, M.X., and Derek, C.J.C.
- Published
- 2024
- Full Text
- View/download PDF
9. WACARDIA: Graphical MATLAB software for Wireless Assessment of CARDiac Interoceptive Accuracy
- Author
-
Kleckner, Ian R. and Chung, Jacob J.
- Published
- 2024
- Full Text
- View/download PDF
10. Investigation of an irradiated CCD device: Building and testing a Charge Transfer Inefficiency correction pipeline using the Pyxel framework
- Author
-
Kelman, Bradley, Prod’homme, Thibaut, Skottfelt, Jesper, Hall, David, Lemmel, Frederic, Seibert, Constanze, Verhoeve, Peter, and Hubbard, Michael
- Published
- 2024
- Full Text
- View/download PDF
11. Taming the Rhinoceros: A brief history of a ubiquitous tool
- Author
-
Canizares, Galo
- Published
- 2024
- Full Text
- View/download PDF
12. Modelling of thermochemical processes of waste recycling: A review
- Author
-
Han, Bing, Kumar, Dileep, Pei, Yang, Norton, Michael, Adams, Scott D., Khoo, Sui Yang, and Kouzani, Abbas Z.
- Published
- 2024
- Full Text
- View/download PDF
13. SIPAS: A comprehensive susceptibility imaging process and analysis studio
- Author
-
Qiu, Lichu, Zhao, Zijun, and Bao, Lijun
- Published
- 2024
- Full Text
- View/download PDF
14. 3D printing traceability in healthcare using 3Diamond software
- Author
-
Capek, Lukas and Schwarz, Daniel
- Published
- 2024
- Full Text
- View/download PDF
15. Measuring and monitoring the transition to the circular economy of universities: CExUNV
- Author
-
Valls-Val, Karen, Ibáñez-Forés, Valeria, and Bovea, María D.
- Published
- 2024
- Full Text
- View/download PDF
16. Impact of different CAD software programs on marginal and internal fit of provisional crowns: An in vitro study
- Author
-
Rençber Kızılkaya, Ayşe and Kara, Aybuke
- Published
- 2024
- Full Text
- View/download PDF
17. Online Learning Insights in Software Development Teams: Stated Interest vs. Actual Participation
- Author
-
Petrescu, Manuela Andreea and Borza, Diana Laura
- Published
- 2024
- Full Text
- View/download PDF
18. Providing a framework for evaluation disease registry and health outcomes Software: Updating the CIPROS checklist
- Author
-
Shafiee, Fatemeh, Sarbaz, Masoume, Marouzi, Parviz, Banaye Yazdipour, Alireza, and Kimiafar, Khalil
- Published
- 2024
- Full Text
- View/download PDF
19. tttrlib: modular software for integrating fluorescence spectroscopy, imaging, and molecular modeling.
- Author
-
Peulen, Thomas-Otavio, Hemmen, Katherina, Greife, Annemarie, Webb, Benjamin, Felekyan, Suren, Sali, Andrej, Seidel, Claus, Sanabria, Hugo, and Heinze, Katrin
- Subjects
Software ,Spectrometry ,Fluorescence ,Models ,Molecular ,Image Processing ,Computer-Assisted - Abstract
SUMMARY: We introduce software for reading, writing and processing fluorescence single-molecule and image spectroscopy data and developing analysis pipelines to unify various spectroscopic analysis tools. Our software can be used for processing multiple experiment types, e.g. for time-resolved single-molecule spectroscopy, laser scanning microscopy, fluorescence correlation spectroscopy and image correlation spectroscopy. The software is file format agnostic and processes multiple time-resolved data formats and outputs. Our software eliminates the need for data conversion and mitigates data archiving issues. AVAILABILITY AND IMPLEMENTATION: tttrlib is available via pip (https://pypi.org/project/tttrlib/) and bioconda while the open-source code is available via GitHub (https://github.com/fluorescence-tools/tttrlib). Presented examples and additional documentation demonstrating how to implement in vitro and live-cell image spectroscopy analysis are available at https://docs.peulen.xyz/tttrlib and https://zenodo.org/records/14002224.
- Published
- 2025
20. A change language for ontologies and knowledge graphs.
- Author
-
Hegde, Harshad, Vendetti, Jennifer, Goutte-Gattat, Damien, Caufield, J, Graybeal, John, Harris, Nomi, Karam, Naouel, Kindermann, Christian, Matentzoglu, Nicolas, Overton, James, Musen, Mark, and Mungall, Christopher
- Subjects
Biological Ontologies ,Humans ,Software ,Natural Language Processing - Abstract
Ontologies and knowledge graphs (KGs) are general-purpose computable representations of some domain, such as human anatomy, and are frequently a crucial part of modern information systems. Most of these structures change over time, incorporating new knowledge or information that was previously missing. Managing these changes is a challenge, both in terms of communicating changes to users and providing mechanisms to make it easier for multiple stakeholders to contribute. To fill that need, we have created KGCL, the Knowledge Graph Change Language (https://github.com/INCATools/kgcl), a standard data model for describing changes to KGs and ontologies at a high level, and an accompanying human-readable Controlled Natural Language (CNL). This language serves two purposes: a curator can use it to request desired changes, and it can also be used to describe changes that have already happened, corresponding to the concepts of apply patch and diff commonly used for managing changes in text documents and computer programs. Another key feature of KGCL is that descriptions are at a high enough level to be useful and understood by a variety of stakeholders-e.g. ontology edits can be specified by commands like add synonym arm to forelimb or move Parkinson disease under neurodegenerative disease. We have also built a suite of tools for managing ontology changes. These include an automated agent that integrates with and monitors GitHub ontology repositories and applies any requested changes and a new component in the BioPortal ontology resource that allows users to make change requests directly from within the BioPortal user interface. Overall, the KGCL data model, its CNL, and associated tooling allow for easier management and processing of changes associated with the development of ontologies and KGs. Database URL: https://github.com/INCATools/kgcl.
- Published
- 2025
21. Automated Lesion and Feature Extraction Pipeline for Brain MRIs with Interpretability.
- Author
-
Eghbali, Reza, Nedelec, Pierre, Weiss, David, Bhalerao, Radhika, Xie, Long, Rudie, Jeffrey, Liu, Chunlei, Sugrue, Leo, and Rauschecker, Andreas
- Subjects
MRI pipeline ,Neuroradiology ,Radiomics ,Humans ,Magnetic Resonance Imaging ,Brain ,Image Processing ,Computer-Assisted ,Machine Learning ,Image Interpretation ,Computer-Assisted ,Software ,Neuroimaging - Abstract
This paper introduces the Automated Lesion and Feature Extraction (ALFE) pipeline, an open-source, Python-based pipeline that consumes MR images of the brain and produces anatomical segmentations, lesion segmentations, and human-interpretable imaging features describing the lesions in the brain. ALFE pipeline is modeled after the neuroradiology workflow and generates features that can be used by physicians for quantitative analysis of clinical brain MRIs and for machine learning applications. The pipeline uses a decoupled design which allows the user to customize the image processing, image registrations, and AI segmentation tools without the need to change the business logic of the pipeline. In this manuscript, we give an overview of ALFE, present the main aspects of ALFE pipeline design philosophy, and present case studies.
- Published
- 2025
22. The UCSC Genome Browser database: 2025 update
- Author
-
Perez, Gerardo, Barber, Galt P, Benet-Pages, Anna, Casper, Jonathan, Clawson, Hiram, Diekhans, Mark, Fischer, Clay, Gonzalez, Jairo Navarro, Hinrichs, Angie S, Lee, Christopher M, Nassar, Luis R, Raney, Brian J, Speir, Matthew L, van Baren, Marijke J, Vaske, Charles J, Haussler, David, Kent, W James, and Haeussler, Maximilian
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Human Genome ,Bioengineering ,Biotechnology ,2.6 Resources and infrastructure (aetiology) ,1.5 Resources and infrastructure (underpinning) ,Humans ,Genomics ,Databases ,Genetic ,Molecular Sequence Annotation ,Software ,Web Browser ,User-Computer Interface ,Internet ,Genome ,Human ,Animals ,Computational Biology ,Environmental Sciences ,Information and Computing Sciences ,Developmental Biology ,Biological sciences ,Chemical sciences ,Environmental sciences - Abstract
The UCSC Genome Browser (https://genome.ucsc.edu) is a widely utilized web-based tool for visualization and analysis of genomic data, encompassing over 4000 assemblies from diverse organisms. Since its release in 2001, it has become an essential resource for genomics and bioinformatics research. Annotation data available on Genome Browser includes both internally created and maintained tracks as well as custom tracks and track hubs provided by the research community. This last year's updates include over 25 new annotation tracks such as the gnomAD 4.1 track on the human GRCh38/hg38 assembly, the addition of three new public hubs, and significant expansions to the Genome Archive[GenArk) system for interacting with the enormous variety of assemblies. We have also made improvements to our interface, including updates to the browser graphic page, such as a new popup dialog feature that now displays item details without requiring navigation away from the main Genome Browser page. GenePred tracks have been upgraded with right-click options for zooming and precise navigation, along with enhanced mouseOver functions. Additional improvements include a new grouping feature for track hubs and hub description info links. A new tutorial focusing on Clinical Genetics has also been added to the UCSC Genome Browser.
- Published
- 2025
23. Genomes OnLine Database (GOLD) v.10: new features and updates
- Author
-
Mukherjee, Supratim, Stamatis, Dimitri, Li, Cindy Tianqing, Ovchinnikova, Galina, Kandimalla, Mahathi, Handke, Van, Reddy, Anuha, Ivanova, Natalia, Woyke, Tanja, Eloe-Fardosh, Emiley A, Chen, I-Min A, Kyrpides, Nikos C, and Reddy, TBK
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Biotechnology ,Human Genome ,Genetics ,2.6 Resources and infrastructure (aetiology) ,Generic health relevance ,Metadata ,Genomics ,Databases ,Genetic ,Internet ,Software ,Humans ,User-Computer Interface ,Genome ,Animals ,Environmental Sciences ,Information and Computing Sciences ,Developmental Biology ,Biological sciences ,Chemical sciences ,Environmental sciences - Abstract
The Genomes OnLine Database (GOLD; https://gold.jgi.doe.gov/) at the Department of Energy Joint Genome Institute is a comprehensive online metadata repository designed to catalog and manage information related to (meta)genomic sequence projects. GOLD provides a centralized platform where researchers can access a wide array of metadata from its four organization levels namely Study, Organism/Biosample, Sequencing Project and Analysis Project. GOLD continues to serve as a valuable resource and has seen significant growth and expansion since its inception in 1997. With its expanded role as a collaborative platform, it not only actively imports data from other primary repositories like National Center for Biotechnology Information but also supports contributions from researchers worldwide. This collaborative approach has enriched the database with diverse datasets, creating a more integrated resource to enhance scientific insights. As genomic research becomes increasingly integral to various scientific disciplines, more researchers and institutions are turning to GOLD for their metadata needs. To meet this growing demand, GOLD has expanded by adding diverse metadata fields, intuitive features, advanced search capabilities and enhanced data visualization tools, making it easier for users to find and interpret relevant information. This manuscript provides an update and highlights the new features introduced over the last 2 years.
- Published
- 2025
24. BGC Atlas: a web resource for exploring the global chemical diversity encoded in bacterial genomes
- Author
-
Bağcı, Caner, Nuhamunada, Matin, Goyat, Hemant, Ladanyi, Casimir, Sehnal, Ludek, Blin, Kai, Kautsar, Satria A, Tagirdzhanov, Azat, Gurevich, Alexey, Mantri, Shrikant, von Mering, Christian, Udwary, Daniel, Medema, Marnix H, Weber, Tilmann, and Ziemert, Nadine
- Subjects
Microbiology ,Biological Sciences ,Genetics ,Genome ,Bacterial ,Internet ,Multigene Family ,Databases ,Genetic ,Bacteria ,Metagenomics ,Metagenome ,Biosynthetic Pathways ,Secondary Metabolism ,Software ,Environmental Sciences ,Information and Computing Sciences ,Developmental Biology ,Biological sciences ,Chemical sciences ,Environmental sciences - Abstract
Secondary metabolites are compounds not essential for an organism's development, but provide significant ecological and physiological benefits. These compounds have applications in medicine, biotechnology and agriculture. Their production is encoded in biosynthetic gene clusters (BGCs), groups of genes collectively directing their biosynthesis. The advent of metagenomics has allowed researchers to study BGCs directly from environmental samples, identifying numerous previously unknown BGCs encoding unprecedented chemistry. Here, we present the BGC Atlas (https://bgc-atlas.cs.uni-tuebingen.de), a web resource that facilitates the exploration and analysis of BGC diversity in metagenomes. The BGC Atlas identifies and clusters BGCs from publicly available datasets, offering a centralized database and a web interface for metadata-aware exploration of BGCs and gene cluster families (GCFs). We analyzed over 35 000 datasets from MGnify, identifying nearly 1.8 million BGCs, which were clustered into GCFs. The analysis showed that ribosomally synthesized and post-translationally modified peptides are the most abundant compound class, with most GCFs exhibiting high environmental specificity. We believe that our tool will enable researchers to easily explore and analyze the BGC diversity in environmental samples, significantly enhancing our understanding of bacterial secondary metabolites, and promote the identification of ecological and evolutionary factors shaping the biosynthetic potential of microbial communities.
- Published
- 2025
25. The secondary metabolism collaboratory: a database and web discussion portal for secondary metabolite biosynthetic gene clusters
- Author
-
Udwary, Daniel W, Doering, Drew T, Foster, Bryce, Smirnova, Tatyana, Kautsar, Satria A, and Mouncey, Nigel J
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Biotechnology ,Generic health relevance ,Multigene Family ,Internet ,Databases ,Genetic ,Secondary Metabolism ,Genome ,Bacterial ,Biosynthetic Pathways ,Bacteria ,Genome ,Archaeal ,Software ,Environmental Sciences ,Information and Computing Sciences ,Developmental Biology ,Biological sciences ,Chemical sciences ,Environmental sciences - Abstract
Secondary metabolites are small molecules produced by all corners of life, often with specialized bioactive functions with clinical and environmental relevance. Secondary metabolite biosynthetic gene clusters (BGCs) can often be identified within DNA sequences by various sequence similarity tools, but determining the exact functions of genes in the pathway and predicting their chemical products can often only be done by careful, manual comparative analysis. To facilitate this, we report the first release of the secondary metabolism collaboratory (SMC), which aims to provide a comprehensive, tool-agnostic repository of BGC sequence data drawn from all publicly available and user-submitted bacterial and archaeal genome and contig sources. On the website, users are provided a searchable catalog of putative BGCs identified from each source, along with visualizations of gene and domain annotations derived from multiple sequence analysis tools. SMC's data is also available through publicly-accessible application programming interface (API) endpoints to facilitate programmatic access. Users are encouraged to share their findings (and search for others') through comment posts on BGC and source pages. At the time of writing, SMC is the largest repository of BGC information, holding 13.1M BGC regions from 1.3M source sequences and growing, and can be found at https://smc.jgi.doe.gov.
- Published
- 2025
26. Identifying Strong Neoantigen MHC-I/II Binding Candidates for Targeted Immunotherapy with SINE.
- Author
-
Bendik, Joseph, Castro, Andrea, Califano, Joseph, Carter, Hannah, and Guo, Theresa
- Subjects
alternative splicing ,head and neck cancer ,melanoma ,neoantigen ,software ,targeted immunotherapy response ,Humans ,Antigens ,Neoplasm ,Immunotherapy ,Histocompatibility Antigens Class I ,Head and Neck Neoplasms ,Histocompatibility Antigens Class II ,Melanoma ,Protein Binding ,Squamous Cell Carcinoma of Head and Neck ,Peptides - Abstract
The discovery of tumor-derived neoantigens which elicit an immune response through major histocompatibility complex (MHC-I/II) binding has led to significant advancements in immunotherapy. While many neoantigens have been discovered through the identification of non-synonymous mutations, the rate of these is low in some cancers, including head and neck squamous cell carcinoma. Therefore, the identification of neoantigens through additional means, such as aberrant splicing, is necessary. To achieve this, we developed the splice isoform neoantigen evaluator (SINE) pipeline. Our tool documents peptides present on spliced or inserted genomic regions of interest using Patient Harmonic-mean Best Rank scores, calculating the MHC-I/II binding affinity across the complete human leukocyte antigen landscape. Here, we found 125 potentially immunogenic events and 9 principal binders in a cohort of head and neck cancer patients where the corresponding wild-type peptides display no MHC-I/II affinity. Further, in a melanoma cohort of patients treated with anti-PD1 therapy, the expression of immunogenic splicing events identified by SINE predicted response, potentially indicating the existence of immune editing in these tumors. Overall, we demonstrate SINEs ability to identify clinically relevant immunogenic neojunctions, thus acting as a useful tool for researchers seeking to understand the neoantigen landscape from aberrant splicing in cancer.
- Published
- 2024
27. Tree polynomials identify a link between co-transcriptional R-loops and nascent RNA folding
- Author
-
Liu, Pengyu, Lusk, Jacob, Jonoska, Nataša, and Vázquez, Mariel
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Human Genome ,1.1 Normal biological development and functioning ,RNA ,Computational Biology ,Transcription ,Genetic ,Nucleic Acid Conformation ,Algorithms ,Software ,RNA Folding ,R-Loop Structures ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics - Abstract
R-loops are a class of non-canonical nucleic acid structures that typically form during transcription when the nascent RNA hybridizes the DNA template strand, leaving the non-template DNA strand unpaired. These structures are abundant in nature and play important physiological and pathological roles. Recent research shows that DNA sequence and topology affect R-loops, yet it remains unclear how these and other factors contribute to R-loop formation. In this work, we investigate the link between nascent RNA folding and the formation of R-loops. We introduce tree-polynomials, a new class of representations of RNA secondary structures. A tree-polynomial representation consists of a rooted tree associated with an RNA secondary structure together with a polynomial that is uniquely identified with the rooted tree. Tree-polynomials enable accurate, interpretable and efficient data analysis of RNA secondary structures without pseudoknots. We develop a computational pipeline for investigating and predicting R-loop formation from a genomic sequence. The pipeline obtains nascent RNA secondary structures from a co-transcriptional RNA folding software, and computes the tree-polynomial representations of the structures. By applying this pipeline to plasmid sequences that contain R-loop forming genes, we establish a strong correlation between the coefficient sums of tree-polynomials and the experimental probability of R-loop formation. Such strong correlation indicates that the pipeline can be used for accurate R-loop prediction. Furthermore, the interpretability of tree-polynomials allows us to characterize the features of RNA secondary structure associated with R-loop formation. In particular, we identify that branches with short stems separated by bulges and interior loops are associated with R-loops.
- Published
- 2024
28. Covalent Labeling Automated Data Analysis Platform for High Throughput in R (coADAPTr): A Proteome-Wide Data Analysis Platform for Covalent Labeling Experiments.
- Author
-
Shortt, Raquel, Pino, Lindsay, Chea, Emily, Ramirez, Carolina, Polasky, Daniel, Nesvizhskii, Alexey, and Jones, Lisa
- Subjects
Software ,Proteome ,Proteomics ,Mass Spectrometry ,Data Analysis ,Protein Footprinting ,Proteins ,High-Throughput Screening Assays - Abstract
Covalent labeling methods coupled to mass spectrometry have emerged in recent years for studying the higher order structure of proteins. Quantifying the extent of modification of proteins in multiple states (i.e., ligand free vs ligand-bound) can provide information on protein interaction sites and regions of conformational change. Though there are several software platforms that are used to quantify the extent of modification, the process can still be time-consuming, particularly for proteome-wide studies. Here, we present an open-source software for quantitation called Covalent labeling Automated Data Analysis Platform for high Throughput in R (coADAPTr). coADAPTr tackles the need for more efficient data analysis in covalent labeling mass spectrometry for techniques such as hydroxyl radical protein footprinting (HRPF). Traditional methods like Excels Power Pivot (PP) are cumbersome and time-intensive, posing challenges for large-scale analyses. coADAPTr simplifies analysis by mimicking the functions used in the previous quantitation platform using PowerPivot in Microsoft Excel but with fewer steps, offering proteome-wide insights with enhanced graphical interpretations. Several features have been added to improve the fidelity and throughput compared to those of PowerPivot. These include filters to remove any duplicate data and the use of the arithmetic mean rather than the geometric mean for quantitation of the extent of modification. Validation studies confirm coADAPTrs accuracy and efficiency while processing data up to 200 times faster than conventional methods. Its open-source design and user-friendly interface make it accessible for researchers exploring intricate biological phenomena via HRPF and other covalent labeling MS methods. coADAPTr marks a significant leap in structural proteomics, providing a versatile and efficient platform for data interpretation. Its potential to transform the field lies in its seamless handling of proteome-wide data analyses, empowering researchers with a robust tool for deciphering complex structural biology data.
- Published
- 2024
29. Functional Analysis of MS-Based Proteomics Data: From Protein Groups to Networks.
- Author
-
Locard-Paulet, Marie, Doncheva, Nadezhda, Morris, John, and Jensen, Lars
- Subjects
Bioinformatics ,Biological databases ,Cytoscape ,Functional enrichment ,Mass spectrometry ,Networks ,Protein groups ,Proteomics ,STRING ,Proteomics ,Mass Spectrometry ,Protein Interaction Maps ,Software ,Humans ,Databases ,Protein - Abstract
Mass spectrometry-based proteomics allows the quantification of thousands of proteins, protein variants, and their modifications, in many biological samples. These are derived from the measurement of peptide relative quantities, and it is not always possible to distinguish proteins with similar sequences due to the absence of protein-specific peptides. In such cases, peptide signals are reported in protein groups that can correspond to several genes. Here, we show that multi-gene protein groups have a limited impact on GO-term enrichment, but selecting only one gene per group affects network analysis. We thus present the Cytoscape app Proteo Visualizer (https://apps.cytoscape.org/apps/ProteoVisualizer) that is designed for retrieving protein interaction networks from STRING using protein groups as input and thus allows visualization and network analysis of bottom-up MS-based proteomics data sets.
- Published
- 2024
30. Consensus prediction of cell type labels in single-cell data with popV.
- Author
-
Ergen, Can, Xing, Galen, Xu, Chenling, Kim, Martin, Jayasuriya, Michael, McGeever, Erin, Oliveira Pisco, Angela, Streets, Aaron, and Yosef, Nir
- Subjects
Single-Cell Analysis ,Humans ,Algorithms ,Molecular Sequence Annotation ,Computational Biology ,Consensus ,Software - Abstract
Cell-type classification is a crucial step in single-cell sequencing analysis. Various methods have been proposed for transferring a cell-type label from an annotated reference atlas to unannotated query datasets. Existing methods for transferring cell-type labels lack proper uncertainty estimation for the resulting annotations, limiting interpretability and usefulness. To address this, we propose popular Vote (popV), an ensemble of prediction models with an ontology-based voting scheme. PopV achieves accurate cell-type labeling and provides uncertainty scores. In multiple case studies, popV confidently annotates the majority of cells while highlighting cell populations that are challenging to annotate by label transfer. This additional step helps to reduce the load of manual inspection, which is often a necessary component of the annotation process, and enables one to focus on the most problematic parts of the annotation, streamlining the overall annotation process.
- Published
- 2024
31. Image processing tools for petabyte-scale light sheet microscopy data
- Author
-
Ruan, Xiongtao, Mueller, Matthew, Liu, Gaoxiang, Görlitz, Frederik, Fu, Tian-Ming, Milkie, Daniel E, Lillvis, Joshua L, Kuhn, Alexander, Gan Chong, Johnny, Hong, Jason Li, Herr, Chu Yi Aaron, Hercule, Wilmene, Nienhaus, Marc, Killilea, Alison N, Betzig, Eric, and Upadhyayula, Srigokul
- Subjects
Biological Sciences ,Bioengineering ,Biomedical Imaging ,Networking and Information Technology R&D (NITRD) ,1.4 Methodologies and measurements ,Generic health relevance ,Software ,Image Processing ,Computer-Assisted ,Microscopy ,Imaging ,Three-Dimensional ,Animals ,Humans ,Algorithms ,Technology ,Medical and Health Sciences ,Developmental Biology ,Biological sciences - Abstract
Light sheet microscopy is a powerful technique for high-speed three-dimensional imaging of subcellular dynamics and large biological specimens. However, it often generates datasets ranging from hundreds of gigabytes to petabytes in size for a single experiment. Conventional computational tools process such images far slower than the time to acquire them and often fail outright due to memory limitations. To address these challenges, we present PetaKit5D, a scalable software solution for efficient petabyte-scale light sheet image processing. This software incorporates a suite of commonly used processing tools that are optimized for memory and performance. Notable advancements include rapid image readers and writers, fast and memory-efficient geometric transformations, high-performance Richardson-Lucy deconvolution and scalable Zarr-based stitching. These features outperform state-of-the-art methods by over one order of magnitude, enabling the processing of petabyte-scale image data at the full teravoxel rates of modern imaging cameras. The software opens new avenues for biological discoveries through large-scale imaging experiments.
- Published
- 2024
32. Standardized and accessible multi-omics bioinformatics workflows through the NMDC EDGE resource
- Author
-
Kelliher, Julia M, Xu, Yan, Flynn, Mark C, Babinski, Michal, Canon, Shane, Cavanna, Eric, Clum, Alicia, Corilo, Yuri E, Fujimoto, Grant, Giberson, Cameron, Johnson, Leah YD, Li, Kaitlyn J, Li, Po-E, Li, Valerie, Lo, Chien-Chi, Lynch, Wendi, Piehowski, Paul, Prime, Kaelan, Purvine, Samuel, Rodriguez, Francisca, Roux, Simon, Shakya, Migun, Smith, Montana, Sarrafan, Setareh, Cholia, Shreyas, McCue, Lee Ann, Mungall, Chris, Hu, Bin, Eloe-Fadrosh, Emiley A, and Chain, Patrick SG
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Microbiome ,Networking and Information Technology R&D (NITRD) ,Data Science ,Genetics ,Human Genome ,1.5 Resources and infrastructure (underpinning) ,Generic health relevance ,Multi-omics ,Bioinformatics workflows ,Standardization ,Software ,Open-source ,Numerical and Computational Mathematics ,Computation Theory and Mathematics ,Biochemistry and cell biology ,Applied computing - Abstract
Accessible and easy-to-use standardized bioinformatics workflows are necessary to advance microbiome research from observational studies to large-scale, data-driven approaches. Standardized multi-omics data enables comparative studies, data reuse, and applications of machine learning to model biological processes. To advance broad accessibility of standardized multi-omics bioinformatics workflows, the National Microbiome Data Collaborative (NMDC) has developed the Empowering the Development of Genomics Expertise (NMDC EDGE) resource, a user-friendly, open-source web application (https://nmdc-edge.org). Here, we describe the design and main functionality of the NMDC EDGE resource for processing metagenome, metatranscriptome, natural organic matter, and metaproteome data. The architecture relies on three main layers (web application, orchestration, and execution) to ensure flexibility and expansion to future workflows. The orchestration and execution layers leverage best practices in software containers and accommodate high-performance computing and cloud computing services. Further, we have adopted a robust user research process to collect feedback for continuous improvement of the resource. NMDC EDGE provides an accessible interface for researchers to process multi-omics microbiome data using production-quality workflows to facilitate improved data standardization and interoperability.
- Published
- 2024
33. Roadmap on methods and software for electronic structure based simulations in chemistry and materials
- Author
-
Blum, Volker, Asahi, Ryoji, Autschbach, Jochen, Bannwarth, Christoph, Bihlmayer, Gustav, Blügel, Stefan, Burns, Lori A, Crawford, T Daniel, Dawson, William, de Jong, Wibe Albert, Draxl, Claudia, Filippi, Claudia, Genovese, Luigi, Giannozzi, Paolo, Govind, Niranjan, Hammes-Schiffer, Sharon, Hammond, Jeff R, Hourahine, Benjamin, Jain, Anubhav, Kanai, Yosuke, Kent, Paul RC, Larsen, Ask Hjorth, Lehtola, Susi, Li, Xiaosong, Lindh, Roland, Maeda, Satoshi, Makri, Nancy, Moussa, Jonathan, Nakajima, Takahito, Nash, Jessica A, Oliveira, Micael JT, Patel, Pansy D, Pizzi, Giovanni, Pourtois, Geoffrey, Pritchard, Benjamin P, Rabani, Eran, Reiher, Markus, Reining, Lucia, Ren, Xinguo, Rossi, Mariana, Schlegel, H Bernhard, Seriani, Nicola, Slipchenko, Lyudmila V, Thom, Alexander, Valeev, Edward F, Van Troeye, Benoit, Visscher, Lucas, Vlček, Vojtěch, Werner, Hans-Joachim, Williams-Young, David B, and Windus, Theresa L
- Subjects
Information and Computing Sciences ,Software Engineering ,electronic structure ,software ,future directions - Abstract
This Roadmap article provides a succinct, comprehensive overview of the state of electronic structure (ES) methods and software for molecular and materials simulations. Seventeen distinct sections collect insights by 51 leading scientists in the field. Each contribution addresses the status of a particular area, as well as current challenges and anticipated future advances, with a particular eye towards software related aspects and providing key references for further reading. Foundational sections cover density functional theory and its implementation in real-world simulation frameworks, Green’s function based many-body perturbation theory, wave-function based and stochastic ES approaches, relativistic effects and semiempirical ES theory approaches. Subsequent sections cover nuclear quantum effects, real-time propagation of the ES, challenges for computational spectroscopy simulations, and exploration of complex potential energy surfaces. The final sections summarize practical aspects, including computational workflows for complex simulation tasks, the impact of current and future high-performance computing architectures, software engineering practices, education and training to maintain and broaden the community, as well as the status of and needs for ES based modeling from the vantage point of industry environments. Overall, the field of ES software and method development continues to unlock immense opportunities for future scientific discovery, based on the growing ability of computations to reveal complex phenomena, processes and properties that are determined by the make-up of matter at the atomic scale, with high precision.
- Published
- 2024
34. Rapid characterization of crude oil by NMR relaxation using new user-friendly software
- Author
-
Canan, Talha Furkan, Ok, Salim, Al-Bazzaz, Waleed, Ponnuswamy, Shunmugavel, Fernandes, Michael, Al-Shamali, Mustafa, Qubian, Ali, and Sagidullin, Alexander
- Published
- 2022
- Full Text
- View/download PDF
35. spread.gl: visualizing pathogen dispersal in a high-performance browser application.
- Author
-
Li, Yimin, Bollen, Nena, Hong, Samuel, Brusselmans, Marius, Gambaro, Fabiana, Klaps, Joon, Suchard, Marc, Rambaut, Andrew, Lemey, Philippe, Dellicour, Simon, and Baele, Guy
- Subjects
Software ,SARS-CoV-2 ,Phylogeography ,Bayes Theorem ,COVID-19 ,Phylogeny ,Humans ,Web Browser - Abstract
MOTIVATION: Bayesian phylogeographic analyses are pivotal in reconstructing the spatio-temporal dispersal histories of pathogens. However, interpreting the complex outcomes of phylogeographic reconstructions requires sophisticated visualization tools. RESULTS: To meet this challenge, we developed spread.gl, an open-source, feature-rich browser application offering a smooth and intuitive visualization tool for both discrete and continuous phylogeographic inferences, including the animation of pathogen geographic dispersal through time. Spread.gl can render and combine the visualization of multiple layers that contain information extracted from the input phylogeny and diverse environmental data layers, enabling researchers to explore which environmental factors may have impacted pathogen dispersal patterns before conducting formal testing. We showcase the visualization features of spread.gl with representative examples, including the smooth animation of a phylogeographic reconstruction based on >17 000 SARS-CoV-2 genomic sequences. AVAILABILITY AND IMPLEMENTATION: Source code, installation instructions, example input data, and outputs of spread.gl are accessible at https://github.com/GuyBaele/SpreadGL.
- Published
- 2024
36. UCBShift 2.0: Bridging the Gap from Backbone to Side Chain Protein Chemical Shift Prediction for Protein Structures.
- Author
-
Ptaszek, Aleksandra, Li, Jie, Konrat, Robert, Platzer, Gerald, and Head-Gordon, Teresa
- Subjects
Proteins ,Protein Conformation ,Nuclear Magnetic Resonance ,Biomolecular ,Software ,Algorithms ,Machine Learning ,Models ,Molecular - Abstract
Chemical shifts are a readily obtainable NMR observable that can be measured with high accuracy, and because they are sensitive to conformational averages and the local molecular environment, they yield detailed information about protein structure in solution. To predict chemical shifts of protein structures, we introduced the UCBShift method that uniquely fuses a transfer prediction module, which employs sequence and structure alignments to select reference chemical shifts from an experimental database, with a machine learning model that uses carefully curated and physics-inspired features derived from X-ray crystal structures to predict backbone chemical shifts for proteins. In this work, we extend the UCBShift 1.0 method to side chain chemical shift prediction to perform whole protein analysis, which, when validated against well-defined test data shows higher accuracy and better reliability compared to the popular SHIFTX2 method. With the greater abundance of cleaned protein shift-structure data and the modularity of the general UCBShift algorithms, users can gain insight into different features important for residue-specific stabilizing interactions for protein backbone and side chain chemical shift prediction. We suggest several backward and forward applications of UCBShift 2.0 that can help validate AlphaFold structures and probe protein dynamics.
- Published
- 2024
37. VI-VS: calibrated identification of feature dependencies in single-cell multiomics.
- Author
-
Boyeau, Pierre, Bates, Stephen, Ergen, Can, Jordan, Michael, and Yosef, Nir
- Subjects
Single-Cell Analysis ,Software ,Machine Learning ,Humans ,Genomics ,Multiomics - Abstract
Unveiling functional relationships between various molecular cell phenotypes from data using machine learning models is a key promise of multiomics. Existing methods either use flexible but hard-to-interpret models or simpler, misspecified models. VI-VS (Variational Inference for Variable Selection) balances flexibility and interpretability to identify relevant feature relationships in multiomic data. It uses deep generative models to identify conditionally dependent features, with false discovery rate control. VI-VS is available as an open-source Python package, providing a robust solution to identify features more likely representing genuine causal relationships.
- Published
- 2024
38. Genetics of Latin American Diversity Project: Insights into population genetics and association studies in admixed groups in the Americas.
- Author
-
Borda, Victor, Loesch, Douglas, Guo, Bing, Laboulaye, Roland, Veliz-Otani, Diego, French, Jennifer, Leal, Thiago, Gogarten, Stephanie, Ikpe, Sunday, Gouveia, Mateus, Mendes, Marla, Abecasis, Gonçalo, Alvim, Isabela, Arboleda-Bustos, Carlos, Arboleda, Gonzalo, Arboleda, Humberto, Barreto, Mauricio, Barwick, Lucas, Bezzera, Marcos, Blangero, John, Borges, Vanderci, Caceres, Omar, Cai, Jianwen, Chana-Cuevas, Pedro, Chen, Zhanghua, Custer, Brian, Dean, Michael, Dinardo, Carla, Domingos, Igor, Duggirala, Ravindranath, Dieguez, Elena, Fernandez, Willian, Ferraz, Henrique, Gilliland, Frank, Guio, Heinner, Horta, Bernardo, Curran, Joanne, Johnsen, Jill, Kaplan, Robert, Kelly, Shannon, Kenny, Eimear, Konkle, Barbara, Kooperberg, Charles, Lescano, Andres, Lima-Costa, M, Loos, Ruth, Manichaikul, Ani, Meyers, Deborah, Naslavsky, Michel, Nickerson, Deborah, North, Kari, Padilla, Carlos, Preuss, Michael, Raggio, Victor, Reiner, Alexander, Rich, Stephen, Rieder, Carlos, Rienstra, Michiel, Rotter, Jerome, Rundek, Tatjana, Sacco, Ralph, Sanchez, Cesar, Sankaran, Vijay, Santos-Lobato, Bruno, Schumacher-Schuh, Artur, Scliar, Marilia, Silverman, Edwin, Sofer, Tamar, Lasky-Su, Jessica, Tumas, Vitor, Weiss, Scott, Mata, Ignacio, Hernandez, Ryan, Tarazona-Santos, Eduardo, and OConnor, Timothy
- Subjects
GLAD-match ,GWAS ,Latin America ,identity-by-descent ,imputation ,local ancestry ,migration ,population structure ,Humans ,Latin America ,Genetics ,Population ,Genome-Wide Association Study ,Haplotypes ,Algorithms ,Genetic Variation ,Software - Abstract
Latin Americans are underrepresented in genetic studies, increasing disparities in personalized genomic medicine. Despite available genetic data from thousands of Latin Americans, accessing and navigating the bureaucratic hurdles for consent or access remains challenging. To address this, we introduce the Genetics of Latin American Diversity (GLAD) Project, compiling genome-wide information from 53,738 Latin Americans across 39 studies representing 46 geographical regions. Through GLAD, we identified heterogeneous ancestry composition and recent gene flow across the Americas. Additionally, we developed GLAD-match, a simulated annealing-based algorithm, to match the genetic background of external samples to our database, sharing summary statistics (i.e., allele and haplotype frequencies) without transferring individual-level genotypes. Finally, we demonstrate the potential of GLAD as a critical resource for evaluating statistical genetic software in the presence of admixture. By providing this resource, we promote genomic research in Latin Americans and contribute to the promises of personalized medicine to more people.
- Published
- 2024
39. Longitudinal registration of T1-weighted breast MRI: A registration algorithm (FLIRE) and clinical application
- Author
-
Tong, Michelle W, Yu, Hon J, Sjaastad Andreassen, Maren M, Loubrie, Stephane, Rodríguez-Soto, Ana E, Seibert, Tyler M, Rakow-Penner, Rebecca, and Dale, Anders M
- Subjects
Biomedical and Clinical Sciences ,Clinical Sciences ,Oncology and Carcinogenesis ,Breast Cancer ,Women's Health ,Bioengineering ,Cancer ,Clinical Research ,Biomedical Imaging ,4.1 Discovery and preclinical testing of markers and technologies ,Humans ,Algorithms ,Breast Neoplasms ,Female ,Magnetic Resonance Imaging ,Breast ,Middle Aged ,Longitudinal Studies ,Neoadjuvant Therapy ,Image Interpretation ,Computer-Assisted ,Reproducibility of Results ,Adult ,Image Processing ,Computer-Assisted ,Aged ,Software ,Non-linear ,Registration ,Longitudinal ,Neoadjuvant chemotherapy ,T1 ,T(1) ,Biomedical Engineering ,Cognitive Sciences ,Nuclear Medicine & Medical Imaging ,Clinical sciences - Abstract
PurposeMRI is commonly used to aid breast cancer diagnosis and treatment evaluation. For patients with breast cancer, neoadjuvant chemotherapy aims to reduce the tumor size and extent of surgery necessary. The current clinical standard to measure breast tumor response on MRI uses the longest tumor diameter. Radiologists also account for other tissue properties including tumor contrast or pharmacokinetics in their assessment. Accurate longitudinal image registration of breast tissue is critical to properly compare response to treatment at different timepoints.MethodsIn this study, a deformable Fast Longitudinal Image Registration (FLIRE) algorithm was optimized for breast tissue. FLIRE was then compared to the publicly available software packages with high accuracy (DRAMMS) and fast runtime (Elastix). Patients included in the study received longitudinal T1-weighted MRI without fat saturation at two to six timepoints as part of asymptomatic screening (n = 27) or throughout neoadjuvant chemotherapy treatment (n = 32). T1-weighted images were registered to the first timepoint with each algorithm.ResultsAlignment and runtime performance were compared using two-way repeated measure ANOVAs (P
- Published
- 2024
40. iSubGen generates integrative disease subtypes by pairwise similarity assessment
- Author
-
Fox, Natalie S, Tian, Mao, Markowitz, Alexander L, Haider, Syed, Li, Constance H, and Boutros, Paul C
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,2.1 Biological and endogenous factors ,Humans ,Algorithms ,Neoplasms ,Software ,Computational Biology ,Proteomics ,Transcriptome ,Gene Expression Profiling ,CP: Cancer biology ,CP: Systems biology ,algorithm ,autoencoder ,biomarkers ,cancer biology ,data correlation ,data integration ,multi-omics ,pattern discovery ,subtype discovery ,system biology - Abstract
There are myriad types of biomedical data-molecular, clinical images, and others. When a group of patients with the same underlying disease exhibits similarities across multiple types of data, this is called a subtype. Existing subtyping approaches struggle to handle diverse data types with missing information. To improve subtype discovery, we exploited changes in the correlation-structure between different data types to create iSubGen, an algorithm for integrative subtype generation. iSubGen can accommodate any feature that can be compared with a similarity metric to create subtypes versatilely. It can combine arbitrary data types for subtype discovery, such as merging genetic, transcriptomic, proteomic, and pathway data. iSubGen recapitulates known subtypes across multiple cancers even with substantial missing data and identifies subtypes with distinct clinical behaviors. It performs equally with or superior to other subtyping methods, offering greater stability and robustness to missing data and flexibility to new data types. It is available at https://cran.r-project.org/web/packages/iSubGen.
- Published
- 2024
41. Assessment of lesion insertion tool in pelvis PET/MR data with applications to attenuation correction method development.
- Author
-
Natsuaki, Yutaka, Leynes, Andrew, Wangerin, Kristen, Hamdi, Mahdjoub, Rajagopal, Abhejit, Kinahan, Paul, Laforest, Richard, Larson, Peder, Hope, Thomas, and James, Sara
- Subjects
PET reconstruction ,PET/CT ,PET/MR ,attenuation correction ,lesion insertion tool ,Humans ,Image Processing ,Computer-Assisted ,Magnetic Resonance Imaging ,Positron Emission Tomography Computed Tomography ,Male ,Multimodal Imaging ,Positron-Emission Tomography ,Pelvic Neoplasms ,Algorithms ,Radiotherapy Planning ,Computer-Assisted ,Female ,Pelvis ,Software ,Radiopharmaceuticals ,Phantoms ,Imaging ,Fluorodeoxyglucose F18 - Abstract
BACKGROUND: In modern positron emission tomography (PET) with multi-modality imaging (e.g., PET/CT and PET/MR), the attenuation correction (AC) is the single largest correction factor for image reconstruction. One way to assess AC methods and other reconstruction parameters is to utilize software-based simulation tools, such as a lesion insertion tool. Extensive validation of these simulation tools is required to ensure results of the study are clinically meaningful. PURPOSE: To evaluate different PET AC methods using a synthetic lesion insertion tool that simulates lesions in a patient cohort that has both PET/MR and PET/CT images. To further demonstrate how lesion insertion tool may be used to extend knowledge of PET reconstruction parameters, including but not limited to AC. METHODS: Lesion quantitation is compared using conventional Dixon-based MR-based AC (MRAC) to that of using CT-based AC (CTAC, a ground truth). First, the pre-existing lesions were simulated in a similar environment; a total of 71 lesions were identified in 18 pelvic PET/MR patient images acquired with a time-of-flight simultaneous PET/MR scanner, and matched lesions were inserted contralaterally on the same axial slice. Second, synthetic lesions were inserted into four anatomic target locations in a cohort of four patients who didnt have any observed clinical lesions in the pelvis. RESULTS: The matched lesion insertions resulted in unity between the lesion error ratios (mean SUVs), demonstrating that the inserted lesions successfully simulated the original lesions. In the second study, the inserted lesions had distinct characteristics by target locations and demonstrated negative max-SUV%diff trends for bone-dominant sites across the patient cohort. CONCLUSIONS: The current work demonstrates that the applied lesion insertion tool can simulate uptake in pelvic lesions and their expected SUV values, and that the lesion insertion tool can be extended to evaluate further PET reconstruction corrections and algorithms and their impact on quantitation accuracy and precision.
- Published
- 2024
42. A retrospective analysis using comorbidity detecting algorithmic software to determine the incidence of International Classification of Diseases (ICD) code omissions and appropriateness of Diagnosis-Related Group (DRG) code modifiers.
- Author
-
Gabel, Eilon, Gal, Jonathan, Grogan, Tristan, and Hofer, Ira
- Subjects
Algorithms ,Clinical coding ,Diagnosis-related groups ,International classification of diseases ,Medical informatics applications ,Humans ,International Classification of Diseases ,Diagnosis-Related Groups ,Algorithms ,Retrospective Studies ,Comorbidity ,Software ,Electronic Health Records ,Male ,Female ,Middle Aged ,Adult - Abstract
BACKGROUND: The mechanism for recording International Classification of Diseases (ICD) and diagnosis related groups (DRG) codes in a patients chart is through a certified medical coder who manually reviews the medical record at the completion of an admission. High-acuity ICD codes justify DRG modifiers, indicating the need for escalated hospital resources. In this manuscript, we demonstrate that value of rules-based computer algorithms that audit for omission of administrative codes and quantifying the downstream effects with regard to financial impacts and demographic findings did not indicate significant disparities. METHODS: All study data were acquired via the UCLA Department of Anesthesiology and Perioperative Medicines Perioperative Data Warehouse. The DataMart is a structured reporting schema that contains all the relevant clinical data entered into the EPIC (EPIC Systems, Verona, WI) electronic health record. Computer algorithms were created for eighteen disease states that met criteria for DRG modifiers. Each algorithm was run against all hospital admissions with completed billing from 2019. The algorithms scanned for the existence of disease, appropriate ICD coding, and DRG modifier appropriateness. Secondarily, the potential financial impact of ICD omissions was estimated by payor class and an analysis of ICD miscoding was done by ethnicity, sex, age, and financial class. RESULTS: Data from 34,104 hospital admissions were analyzed from January 1, 2019, to December 31, 2019. 11,520 (32.9%) hospital admissions were algorithm positive for a disease state with no corresponding ICD code. 1,990 (5.8%) admissions were potentially eligible for DRG modification/upgrade with an estimated lost revenue of $22,680,584.50. ICD code omission rates compared against reference groups (private payors, Caucasians, middle-aged patients) demonstrated significant p-values
- Published
- 2024
43. MVP: a modular viromics pipeline to identify, filter, cluster, annotate, and bin viruses from metagenomes
- Author
-
Coclet, Clément, Camargo, Antonio Pedro, and Roux, Simon
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Bioengineering ,Networking and Information Technology R&D (NITRD) ,Biotechnology ,Generic health relevance ,Metagenome ,Genome ,Viral ,Metagenomics ,Viruses ,Software ,Virome ,Computational Biology ,Molecular Sequence Annotation ,viromics pipeline ,sequencing data ,phages ,viruses ,ecological studies - Abstract
While numerous computational frameworks and workflows are available for recovering prokaryote and eukaryote genomes from metagenome data, only a limited number of pipelines are designed specifically for viromics analysis. With many viromics tools developed in the last few years alone, it can be challenging for scientists with limited bioinformatics experience to easily recover, evaluate quality, annotate genes, dereplicate, assign taxonomy, and calculate relative abundance and coverage of viral genomes using state-of-the-art methods and standards. Here, we describe Modular Viromics Pipeline (MVP) v.1.0, a user-friendly pipeline written in Python and providing a simple framework to perform standard viromics analyses. MVP combines multiple tools to enable viral genome identification, characterization of genome quality, filtering, clustering, taxonomic and functional annotation, genome binning, and comprehensive summaries of results that can be used for downstream ecological analyses. Overall, MVP provides a standardized and reproducible pipeline for both extensive and robust characterization of viruses from large-scale sequencing data including metagenomes, metatranscriptomes, viromes, and isolate genomes. As a typical use case, we show how the entire MVP pipeline can be applied to a set of 20 metagenomes from wetland sediments using only 10 modules executed via command lines, leading to the identification of 11,656 viral contigs and 8,145 viral operational taxonomic units (vOTUs) displaying a clear beta-diversity pattern. Further, acting as a dynamic wrapper, MVP is designed to continuously incorporate updates and integrate new tools, ensuring its ongoing relevance in the rapidly evolving field of viromics. MVP is available at https://gitlab.com/ccoclet/mvp and as versioned packages in PyPi and Conda.IMPORTANCEThe significance of our work lies in the development of Modular Viromics Pipeline (MVP), an integrated and user-friendly pipeline tailored exclusively for viromics analyses. MVP stands out due to its modular design, which ensures easy installation, execution, and integration of new tools and databases. By combining state-of-the-art tools such as geNomad and CheckV, MVP provides high-quality viral genome recovery and taxonomy and host assignment, and functional annotation, addressing the limitations of existing pipelines. MVP's ability to handle diverse sample types, including environmental, human microbiome, and plant-associated samples, makes it a versatile tool for the broader microbiome research community. By standardizing the analysis process and providing easily interpretable results, MVP enables researchers to perform comprehensive studies of viral communities, significantly advancing our understanding of viral ecology and its impact on various ecosystems.
- Published
- 2024
44. Energy Aware Technology Mapping of Genetic Logic Circuits
- Author
-
Kubaczka, Erik, Gehri, Maximilian, Marlhens, Jérémie JM, Schwarz, Tobias, Molderings, Maik, Engelmann, Nicolai, Garcia, Hernan G, Hochberger, Christian, and Koeppl, Heinz
- Subjects
Biochemistry and Cell Biology ,Biological Sciences ,Genetics ,Affordable and Clean Energy ,Gene Regulatory Networks ,Synthetic Biology ,Energy Metabolism ,Software ,Models ,Genetic ,Entropy ,genetic design automation ,energy ,non-equilibrium ,thermodynamics ,synthetic biology ,gene-expression ,technology mapping ,metabolic burden ,computeraided design ,entropy production rate ,computer aided design ,Medicinal and Biomolecular Chemistry ,Biomedical Engineering ,Biochemistry and cell biology ,Bioinformatics and computational biology - Abstract
Energy and its dissipation are fundamental to all living systems, including cells. Insufficient abundance of energy carriers─as caused by the additional burden of artificial genetic circuits─shifts a cell's priority to survival, also impairing the functionality of the genetic circuit. Moreover, recent works have shown the importance of energy expenditure in information transmission. Despite living organisms being non-equilibrium systems, non-equilibrium models capable of accounting for energy dissipation and non-equilibrium response curves are not yet employed in genetic design automation (GDA) software. To this end, we introduce Energy Aware Technology Mapping, the automated design of genetic logic circuits with respect to energy efficiency and functionality. The basis for this is an energy aware non-equilibrium steady state model of gene expression, capturing characteristics like energy dissipation─which we link to the entropy production rate─and transcriptional bursting, relevant to eukaryotes as well as prokaryotes. Our evaluation shows that a genetic logic circuit's functional performance and energy efficiency are disjoint optimization goals. For our benchmark, energy efficiency improves by 37.2% on average when comparing to functionally optimized variants. We discover a linear increase in energy expenditure and overall protein expression with the circuit size, where Energy Aware Technology Mapping allows for designing genetic logic circuits with the energetic costs of circuits that are one to two gates smaller. Structural variants improve this further, while results show the Pareto dominance among structures of a single Boolean function. By incorporating energy demand into the design, Energy Aware Technology Mapping enables energy efficiency by design. This extends current GDA tools and complements approaches coping with burden in vivo.
- Published
- 2024
45. CHARMM at 45: Enhancements in Accessibility, Functionality, and Speed.
- Author
-
Hwang, Wonmuk, Austin, Steven, Blondel, Arnaud, Boittier, Eric, Boresch, Stefan, Buck, Matthias, Buckner, Joshua, Caflisch, Amedeo, Chang, Hao-Ting, Cheng, Xi, Choi, Yeol, Chu, Jhih-Wei, Crowley, Michael, Cui, Qiang, Damjanovic, Ana, Deng, Yuqing, Devereux, Mike, Ding, Xinqiang, Feig, Michael, Gao, Jiali, Glowacki, David, Gonzales, James, Hamaneh, Mehdi, Harder, Edward, Hayes, Ryan, Huang, Jing, Huang, Yandong, Hudson, Phillip, Im, Wonpil, Islam, Shahidul, Jiang, Wei, Jones, Michael, Käser, Silvan, Kearns, Fiona, Kern, Nathan, Klauda, Jeffery, Lazaridis, Themis, Lee, Jinhyuk, Lemkul, Justin, Liu, Xiaorong, Luo, Yun, MacKerell, Alexander, Major, Dan, Meuwly, Markus, Nam, Kwangho, Nilsson, Lennart, Ovchinnikov, Victor, Paci, Emanuele, Park, Soohyung, Pastor, Richard, Pittman, Amanda, Post, Carol, Prasad, Samarjeet, Pu, Jingzhi, Qi, Yifei, Rathinavelan, Thenmalarchelvi, Roe, Daniel, Roux, Benoit, Rowley, Christopher, Shen, Jana, Simmonett, Andrew, Sodt, Alexander, Töpfer, Kai, Upadhyay, Meenu, van der Vaart, Arjan, Vazquez-Salazar, Luis, Venable, Richard, Warrensford, Luke, Woodcock, H, Wu, Yujin, Brooks, Charles, Brooks, Bernard, and Karplus, Martin
- Subjects
Quantum Theory ,Molecular Dynamics Simulation ,Software - Abstract
Since its inception nearly a half century ago, CHARMM has been playing a central role in computational biochemistry and biophysics. Commensurate with the developments in experimental research and advances in computer hardware, the range of methods and applicability of CHARMM have also grown. This review summarizes major developments that occurred after 2009 when the last review of CHARMM was published. They include the following: new faster simulation engines, accessible user interfaces for convenient workflows, and a vast array of simulation and analysis methods that encompass quantum mechanical, atomistic, and coarse-grained levels, as well as extensive coverage of force fields. In addition to providing the current snapshot of the CHARMM development, this review may serve as a starting point for exploring relevant theories and computational methods for tackling contemporary and emerging problems in biomolecular systems. CHARMM is freely available for academic and nonprofit research at https://academiccharmm.org/program.
- Published
- 2024
46. MHConstructor: a high-throughput, haplotype-informed solution to the MHC assembly challenge.
- Author
-
Wade, Kristen, Suseno, Rayo, Kizer, Kerry, Williams, Jacqueline, Boquett, Juliano, Caillier, Stacy, Pollock, Nicholas, Renschen, Adam, Santaniello, Adam, Oksenberg, Jorge, Norman, Paul, Augusto, Danillo, and Hollenbach, Jill
- Subjects
De novo assembly ,Haplotype ,Human leukocyte antigen genes ,Major histocompatibility complex ,Short-read sequencing ,Haplotypes ,Humans ,Major Histocompatibility Complex ,Software ,High-Throughput Nucleotide Sequencing ,Algorithms - Abstract
The extremely high levels of genetic polymorphism within the human major histocompatibility complex (MHC) limit the usefulness of reference-based alignment methods for sequence assembly. We incorporate a short-read, de novo assembly algorithm into a workflow for novel application to the MHC. MHConstructor is a containerized pipeline designed for high-throughput, haplotype-informed, reproducible assembly of both whole genome sequencing and target capture short-read data in large, population cohorts. To-date, no other self-contained tool exists for the generation of de novo MHC assemblies from short-read data. MHConstructor facilitates wide-spread access to high-quality, alignment-free MHC sequence analysis.
- Published
- 2024
47. CoRAL accurately resolves extrachromosomal DNA genome structures with long-read sequencing.
- Author
-
Zhu, Kaiyuan, Jones, Matthew, Luebeck, Jens, Bu, Xinxin, Yi, Hyerim, Hung, King, Wong, Ivy, Zhang, Shu, Mischel, Paul, Chang, Howard, and Bafna, Vineet
- Subjects
Humans ,High-Throughput Nucleotide Sequencing ,Sequence Analysis ,DNA ,Gene Amplification ,Neoplasms ,DNA ,Genome ,Human ,Software - Abstract
Extrachromosomal DNA (ecDNA) is a central mechanism for focal oncogene amplification in cancer, occurring in ∼15% of early-stage cancers and ∼30% of late-stage cancers. ecDNAs drive tumor formation, evolution, and drug resistance by dynamically modulating oncogene copy number and rewiring gene-regulatory networks. Elucidating the genomic architecture of ecDNA amplifications is critical for understanding tumor pathology and developing more effective therapies. Paired-end short-read (Illumina) sequencing and mapping have been utilized to represent ecDNA amplifications using a breakpoint graph, in which the inferred architecture of ecDNA is encoded as a cycle in the graph. Traversals of breakpoint graphs have been used to successfully predict ecDNA presence in cancer samples. However, short-read technologies are intrinsically limited in the identification of breakpoints, phasing together complex rearrangements and internal duplications, and deconvolution of cell-to-cell heterogeneity of ecDNA structures. Long-read technologies, such as from Oxford Nanopore Technologies, have the potential to improve inference as the longer reads are better at mapping structural variants and are more likely to span rearranged or duplicated regions. Here, we propose Complete Reconstruction of Amplifications with Long reads (CoRAL) for reconstructing ecDNA architectures using long-read data. CoRAL reconstructs likely cyclic architectures using quadratic programming that simultaneously optimizes parsimony of reconstruction, explained copy number, and consistency of long-read mapping. CoRAL substantially improves reconstructions in extensive simulations and 10 data sets from previously characterized cell lines compared with previous short- and long-read-based tools. As long-read usage becomes widespread, we anticipate that CoRAL will be a valuable tool for profiling the landscape and evolution of focal amplifications in tumors.
- Published
- 2024
48. Memory-bound k-mer selection for large and evolutionarily diverse reference libraries.
- Author
-
Şapcı, Ali and Mirarab, Siavash
- Subjects
Algorithms ,Metagenomics ,Humans ,Gene Library ,Software ,Computational Biology ,Sequence Analysis ,DNA ,Databases ,Genetic ,Metagenome - Abstract
Using k-mers to find sequence matches is increasingly used in many bioinformatic applications, including metagenomic sequence classification. The accuracy of these downstream applications relies on the density of the reference databases, which are rapidly growing. Although the increased density provides hope for improvements in accuracy, scalability is a concern. Reference k-mers are kept in the memory during the query time, and saving all k-mers of these ever-expanding databases is fast becoming impractical. Several strategies for subsampling have been proposed, including minimizers and finding taxon-specific k-mers. However, we contend that these strategies are inadequate, especially when reference sets are taxonomically imbalanced, as are most microbial libraries. In this paper, we explore approaches for selecting a fixed-size subset of k-mers present in an ultra-large data set to include in a library such that the classification of reads suffers the least. Our experiments demonstrate the limitations of existing approaches, especially for novel and poorly sampled groups. We propose a library construction algorithm called k-mer RANKer (KRANK) that combines several components, including a hierarchical selection strategy with adaptive size restrictions and an equitable coverage strategy. We implement KRANK in highly optimized code and combine it with the locality-sensitive hashing classifier CONSULT-II to build a taxonomic classification and profiling method. On several benchmarks, KRANK k-mer selection significantly reduces memory consumption with minimal loss in classification accuracy. We show in extensive analyses based on CAMI benchmarks that KRANK outperforms k-mer-based alternatives in terms of taxonomic profiling and comes close to the best marker-based methods in terms of accuracy.
- Published
- 2024
49. Utilizing non-invasive prenatal test sequencing data for human genetic investigation.
- Author
-
Liu, Siyang, Liu, Yanhong, Gu, Yuqin, Lin, Xingchen, Zhu, Huanhuan, Liu, Hankui, Xu, Zhe, Cheng, Shiyao, Lan, Xianmei, Li, Linxuan, Huang, Mingxi, Li, Hao, Nielsen, Rasmus, Davies, Robert, Albrechtsen, Anders, Chen, Guo-Bo, Qiu, Xiu, Jin, Xin, and Huang, Shujia
- Subjects
NIPT-human-genetics workflow ,allele frequency estimation ,cell-free DNA ,family relatedness ,genome-wide association analysis ,genotype imputation ,low-pass whole-genome sequencing ,non-invasive prenatal test ,population structure ,variant detection ,Humans ,Female ,Pregnancy ,Genome-Wide Association Study ,Noninvasive Prenatal Testing ,Prenatal Diagnosis ,Gene Frequency ,Algorithms ,Genotype ,Sequence Analysis ,DNA ,Polymorphism ,Single Nucleotide ,Software - Abstract
Non-invasive prenatal testing (NIPT) employs ultra-low-pass sequencing of maternal plasma cell-free DNA to detect fetal trisomy. Its global adoption has established NIPT as a large human genetic resource for exploring genetic variations and their associations with phenotypes. Here, we present methods for analyzing large-scale, low-depth NIPT data, including customized algorithms and software for genetic variant detection, genotype imputation, family relatedness, population structure inference, and genome-wide association analysis of maternal genomes. Our results demonstrate accurate allele frequency estimation and high genotype imputation accuracy (R2>0.84) for NIPT sequencing depths from 0.1× to 0.3×. We also achieve effective classification of duplicates and first-degree relatives, along with robust principal-component analysis. Additionally, we obtain an R2>0.81 for estimating genetic effect sizes across genotyping and sequencing platforms with adequate sample sizes. These methods offer a robust theoretical and practical foundation for utilizing NIPT data in medical genetic research.
- Published
- 2024
50. Visualizing scRNA-Seq data at population scale with GloScope.
- Author
-
Wang, Hao, Torous, William, Gong, Boying, and Purdom, Elizabeth
- Subjects
Batch effect detection and visualization ,Density estimation ,Single-cell sequencing data ,scRNA-Seq ,Single-Cell Analysis ,Software ,RNA-Seq ,Humans ,Computational Biology ,Animals ,Single-Cell Gene Expression Analysis - Abstract
Increasingly, scRNA-Seq studies explore cell populations across different samples and the effect of sample heterogeneity on organisms phenotype. However, relatively few bioinformatic methods have been developed which adequately address the variation between samples for such population-level analyses. We propose a framework for representing the entire single-cell profile of a sample, which we call a GloScope representation. We implement GloScope on scRNA-Seq datasets from study designs ranging from 12 to over 300 samples and demonstrate how GloScope allows researchers to perform essential bioinformatic tasks at the sample-level, in particular visualization and quality control assessment.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.