Publication Year Range: Last 10 years / Publisher: cold spring harbor laboratory / Topic: mathematics - Searchworks@Jio Institute Digital Library Search Results

Showing total 225 results

Start Over Topic mathematics Publication Year Range Last 10 years Publisher cold spring harbor laboratory

225 results

1. A paper-filter system to investigate the real micro-environment circuiting plant roots

Author: Chao Huang
Subjects: Micro environment, Plant roots, Simple (abstract algebra), Paper filter, Plant root, Biological system, Mathematics
Abstract: The micro-environment circling the plant root is an interesting topic for many researchers. Until now there is not an approach to investigate the exact density of chemicals surrounding the plant roots. here we use a simple paper filter system and gas chromatography to quantify the exact density of chemicals surrounding the plant roots. this can help solve a long-existing doubt about the ‘real micro-environment’ surrounding the plant roots.
Published: 2019
Full Text: View/download PDF

2. Formal Links between Feature Diversity and Phylogenetic Diversity

Author: Kristina Wicke, Mike Steel, and Arne Ø. Mooers
Subjects: 0301 basic medicine, 0106 biological sciences, Theoretical computer science, Short paper, Biology, 010603 evolutionary biology, 01 natural sciences, 03 medical and health sciences, Simple (abstract algebra), Genetics, Feature (machine learning), Ecology, Evolution, Behavior and Systematics, Phylogeny, 030304 developmental biology, Mathematics, 0303 health sciences, Phylogenetic tree, Biodiversity, 15. Life on land, Cooperative game theory, Shapley value, Tree (data structure), Phylogenetic diversity, 030104 developmental biology, Optimal distinctiveness theory, Diversity (business)
Abstract: The extent to which phylogenetic diversity (PD) captures feature diversity (FD) is a topical and controversial question in biodiversity conservation. In this short paper, we formalize this question and establish a precise mathematical condition for FD (based on discrete characters) to coincide with PD. In this way, we make explicit the two main reasons why the two diversity measures might disagree for given data; namely, the presence of certain patterns of feature evolution and loss, and using temporal branch lengths for PD in settings that may not be appropriate (e.g., due to rapid evolution of certain features over short periods of time). Our article also explores the relationship between the “Fair Proportion” index of PD and a simple index of FD (both of which correspond to Shapley values in cooperative game theory). In a second mathematical result, we show that the two indices can take identical values for any phylogenetic tree, provided the branch lengths in the tree are chosen appropriately. [Evolutionary distinctiveness; feature diversity; phylogenetic diversity; shapley value.]
Published: 2020

3. Estimation of the final size of the COVID-19 epidemic

Author: Milan Batista
Subjects: Estimation, Short paper, Statistics, medicine, Logistic function, medicine.disease_cause, Logistic regression, Coronavirus, Mathematics
Abstract: In this short paper, the logistic growth model and classic susceptible-infected-recovered dynamic model are used to estimate the final size of the coronavirus epidemic.
Published: 2020

4. A rarefaction-without-resampling extension of PERMANOVA for testing presence-absence associations in the microbiome

Author: Glen A. Satten and Yi-Juan Hu
Subjects: Statistics and Probability, Jaccard index, Microbiota, Confounding, Rarefaction, Original Papers, Biochemistry, Computer Science Applications, UniFrac, Computational Mathematics, Distance matrix, Overdispersion, Computational Theory and Mathematics, Research Design, Resampling, Statistics, Covariate, Humans, Molecular Biology, Gene Library, Mathematics
Abstract: Motivation PERMANOVA is currently the most commonly used method for testing community-level hypotheses about microbiome associations with covariates of interest. PERMANOVA can test for associations that result from changes in which taxa are present or absent by using the Jaccard or unweighted UniFrac distance. However, such presence–absence analyses face a unique challenge: confounding by library size (total sample read count), which occurs when library size is associated with covariates in the analysis. It is known that rarefaction (subsampling to a common library size) controls this bias but at the potential costs of information loss and the introduction of a stochastic component into the analysis. Results Here, we develop a non-stochastic approach to PERMANOVA presence–absence analyses that aggregates information over all potential rarefaction replicates without actual resampling, when the Jaccard or unweighted UniFrac distance is used. We compare this new approach to three possible ways of aggregating PERMANOVA over multiple rarefactions obtained from resampling: averaging the distance matrix, averaging the (element-wise) squared distance matrix and averaging the F-statistic. Our simulations indicate that our non-stochastic approach is robust to confounding by library size and outperforms each of the stochastic resampling approaches. We also show that, when overdispersion is low, averaging the (element-wise) squared distance outperforms averaging the unsquared distance, currently implemented in the R package vegan. We illustrate our methods using an analysis of data on inflammatory bowel disease in which samples from case participants have systematically smaller library sizes than samples from control participants. Availability and implementation We have implemented all the approaches described above, including the function for calculating the analytical average of the squared or unsquared distance matrix, in our R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM. Supplementary information Supplementary data are available at Bioinformatics online.
Published: 2021

5. A Rarefaction-Based Extension of the LDM for Testing Presence-Absence Associations in the Microbiome

Author: Yi-Juan Hu, Glen A. Satten, and Andrea N. Lane
Subjects: Statistics and Probability, 0303 health sciences, 030306 microbiology, Computer science, Confounding, Rarefaction, Sample (statistics), Extension (predicate logic), Residual, Biochemistry, Original Papers, Computer Science Applications, 03 medical and health sciences, Computational Mathematics, Computational Theory and Mathematics, Statistics, Covariate, Data analysis, Microbiome, Molecular Biology, Relative species abundance, 030304 developmental biology, Mathematics
Abstract: Motivation Many methods for testing association between the microbiome and covariates of interest (e.g. clinical outcomes, environmental factors) assume that these associations are driven by changes in the relative abundance of taxa. However, these associations may also result from changes in which taxa are present and which are absent. Analyses of such presence–absence associations face a unique challenge: confounding by library size (total sample read count), which occurs when library size is associated with covariates in the analysis. It is known that rarefaction (subsampling to a common library size) controls this bias, but at the potential cost of information loss as well as the introduction of a stochastic component into the analysis. Currently, there is a need for robust and efficient methods for testing presence–absence associations in the presence of such confounding, both at the community level and at the individual-taxon level, that avoid the drawbacks of rarefaction. Results We have previously developed the linear decomposition model (LDM) that unifies the community-level and taxon-level tests into one framework. Here, we present an extension of the LDM for testing presence–absence associations. The extended LDM is a non-stochastic approach that repeatedly applies the LDM to all rarefied taxa count tables, averages the residual sum-of-squares (RSS) terms over the rarefaction replicates, and then forms an F-statistic based on these average RSS terms. We show that this approach compares favorably to averaging the F-statistic from R rarefaction replicates, which can only be calculated stochastically. The flexible nature of the LDM allows discrete or continuous traits or interactions to be tested while allowing confounding covariates to be adjusted for. Our simulations indicate that our proposed method is robust to any systematic differences in library size and has better power than alternative approaches. We illustrate our method using an analysis of data on inflammatory bowel disease (IBD) in which cases have systematically smaller library sizes than controls. Availabilityand implementation The R package LDM is available on GitHub at https://github.com/yijuanhu/LDM in formats appropriate for Macintosh or Windows. Supplementary information Supplementary data are available at Bioinformatics online.
Published: 2020

6. The effect of statistical normalisation on network propagation scores

Author: Sergio Picart-Armada, Alexandre Perera-Lluna, Wesley K. Thompson, Alfonso Buil, Universitat Politècnica de Catalunya. Doctorat en Enginyeria Biomèdica, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, and Universitat Politècnica de Catalunya. B2SLab - Bioinformatics and Biomedical Signals Laboratory
Subjects: Computer science, 02 engineering and technology, Biochemistry, Interactome, Diffusion, Computational biology, Protein-protein interaction, 0302 clinical medicine, Informàtica [Àrees temàtiques de la UPC], 0202 electrical engineering, electronic engineering, information engineering, Protein function prediction, Prospective Studies, Protein Interaction Maps, Mathematics, Parametric statistics, 0303 health sciences, Statistics, Variance (accounting), Covariance, Original Papers, Graph, Computer Science Applications, Computational Mathematics, Kernel method, Computational Theory and Mathematics, Null (SQL), Graph (abstract data type), Network analysis, Mineria de dades, Statistics and Probability, Normalization (statistics), Network topology, Biologia computacional, 03 medical and health sciences, Permutation, Interaction network, 020204 information systems, Molecular Biology, Data mining, 030304 developmental biology, business.industry, Null (mathematics), Computational Biology, Proteins, Kernel methods, Pattern recognition, Kernel, Funcions de, Kernel functions, Artificial intelligence, business, 030217 neurology & neurosurgery
Abstract: Motivation Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene–disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterized some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein–protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels. Results Diffusion scores starting from binary labels were affected by the label codification and exhibited a problem-dependent topological bias that could be removed by the statistical normalization. Parametric and non-parametric normalization addressed both points by being codification-independent and by equalizing the bias. We identified and quantified two sources of bias—mean value and variance—that yielded performance differences when normalizing the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalization was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities. Availability The code is publicly available at https://github.com/b2slab/diffuBench and the data underlying this article are available at https://github.com/b2slab/retroData Supplementary information Supplementary data are available at Bioinformatics online.
Published: 2020

7. Model-based branching point detection in single-cell data by K-Branches clustering

Author: F. Alexander Wolf, Nikolaos-Kosmas Chlis, and Fabian J. Theis
Subjects: 0301 basic medicine, Statistics and Probability, Computer science, Cellular differentiation, Gene Expression, Inference, computer.software_genre, Models, Biological, Biochemistry, Branching (linguistics), Mice, 03 medical and health sciences, Text mining, Software, Animals, Cluster Analysis, Humans, Cellular development, Progenitor cell, Cluster analysis, Molecular Biology, Statistic, Mathematics, Sequence Analysis, RNA, business.industry, Gene Expression Profiling, Model selection, Cell Differentiation, Branching points, Original Papers, Computer Science Applications, Computational Mathematics, 030104 developmental biology, Computational Theory and Mathematics, Data mining, Single-Cell Analysis, business, computer, Algorithm, Algorithms
Abstract: Motivation The identification of heterogeneities in cell populations by utilizing single-cell technologies such as single-cell RNA-Seq, enables inference of cellular development and lineage trees. Several methods have been proposed for such inference from high-dimensional single-cell data. They typically assign each cell to a branch in a differentiation trajectory. However, they commonly assume specific geometries such as tree-like developmental hierarchies and lack statistically sound methods to decide on the number of branching events. Results We present K-Branches, a solution to the above problem by locally fitting half-lines to single-cell data, introducing a clustering algorithm similar to K-Means. These halflines are proxies for branches in the differentiation trajectory of cells. We propose a modified version of the GAP statistic for model selection, in order to decide on the number of lines that best describe the data locally. In this manner, we identify the location and number of subgroups of cells that are associated with branching events and full differentiation, respectively. We evaluate the performance of our method on single-cell RNA-Seq data describing the differentiation of myeloid progenitors during hematopoiesis, single-cell qPCR data of mouse blastocyst development, single-cell qPCR data of human myeloid monocytic leukemia and artificial data. Availability and implementation An R implementation of K-Branches is freely available at https://github.com/theislab/kbranches. Supplementary information Supplementary data are available at Bioinformatics online.
Published: 2016

8. Domain prediction with probabilistic directional context

Author: Alejandro Ochoa and Mona Singh
Subjects: Models, Molecular, 0301 basic medicine, Statistics and Probability, Computer science, Protein domain, Context (language use), computer.software_genre, Markov model, 01 natural sciences, Biochemistry, Domain (software engineering), 010104 statistics & probability, 03 medical and health sciences, 0302 clinical medicine, Protein Domains, Sequence Analysis, Protein, Humans, 0101 mathematics, Molecular Biology, 030304 developmental biology, Mathematics, 0303 health sciences, Sequence, Models, Statistical, Markov chain, Probabilistic logic, Computational Biology, Original Papers, Computer Science Applications, Computational Mathematics, 030104 developmental biology, Computational Theory and Mathematics, Pairwise comparison, Data mining, Sequence Analysis, computer, Algorithm, Algorithms, Software, 030217 neurology & neurosurgery
Abstract: Motivation Protein domain prediction is one of the most powerful approaches for sequence-based function prediction. Although domain instances are typically predicted independently of each other, newer approaches have demonstrated improved performance by rewarding domain pairs that frequently co-occur within sequences. However, most of these approaches have ignored the order in which domains preferentially co-occur and have also not modeled domain co-occurrence probabilistically. Results We introduce a probabilistic approach for domain prediction that models ‘directional’ domain context. Our method is the first to score all domain pairs within a sequence while taking their order into account, even for non-sequential domains. We show that our approach extends a previous Markov model-based approach to additionally score all pairwise terms, and that it can be interpreted within the context of Markov random fields. We formulate our underlying combinatorial optimization problem as an integer linear program, and demonstrate that it can be solved quickly in practice. Finally, we perform extensive evaluation of domain context methods and demonstrate that incorporating context increases the number of domain predictions by ∼15%, with our approach dPUC2 (Domain Prediction Using Context) outperforming all competing approaches. Availability and Implementation dPUC2 is available at http://github.com/alexviiia/dpuc2. Supplementary information Supplementary data are available at Bioinformatics online.
Published: 2016
Full Text: View/download PDF

9. Variance Adaptive Shrinkage (vash): Flexible Empirical Bayes estimation of variances

Author: Mengyin Lu and Matthew Stephens
Subjects: 0301 basic medicine, Statistics and Probability, Biochemistry, 03 medical and health sciences, Bayes' theorem, Statistics, Variance estimation, Range (statistics), Econometrics, Animals, Humans, F-test of equality of variances, Molecular Biology, Inverse-gamma distribution, Mathematics, Estimation, Genome, Gene Expression Profiling, Bayes Theorem, Genomics, Variance (accounting), Original Papers, Expression (mathematics), Computer Science Applications, Computational Mathematics, R package, 030104 developmental biology, Efficiency, Distribution (mathematics), Computational Theory and Mathematics
Abstract: Motivation: Genomic studies often involve estimation of variances of thousands of genes (or other genomic units) from just a few measurements on each. For example, variance estimation is an important step in gene expression analyses aimed at identifying differentially expressed genes. A common approach to this problem is to use an Empirical Bayes (EB) method that assumes the variances among genes follow an inverse-gamma distribution. This distributional assumption is relatively inflexible; for example, it may not capture ‘outlying’ genes whose variances are considerably bigger than usual. Here we describe a more flexible EB method, capable of capturing a much wider range of distributions. Indeed, the main assumption is that the distribution of the variances is unimodal (or, as an alternative, that the distribution of the precisions is unimodal). We argue that the unimodal assumption provides an attractive compromise between flexibility, computational tractability and statistical efficiency. Results: We show that this more flexible approach provides competitive performance with existing methods when the variances truly come from an inverse-gamma distribution, and can outperform them when the distribution of the variances is more complex. In analyses of several human gene expression datasets from the Genotype Tissues Expression consortium, we find that our more flexible model often fits the data appreciably better than the single inverse gamma distribution. At the same time we find that in these data this improved model fit leads to only small improvements in variance estimates and detection of differentially expressed genes. Availability and Implementation: Our methods are implemented in an R package vashr available from http://github.com/mengyin/vashr. Contact: mstephens@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Published: 2016

10. Succinct Colored de Bruijn Graphs

Author: Paul S. Morley, Travis Gagie, Keith E. Belk, Alexander Bowe, Martin D. Muggli, Robert Raymond, Noelle R. Noyes, Simon J. Puglisi, and Christina Boucher
Subjects: 0301 basic medicine, Statistics and Probability, Theoretical computer science, Genotyping Techniques, Computer science, Population, Biochemistry, De Bruijn graph, 03 medical and health sciences, symbols.namesake, 0302 clinical medicine, education, Molecular Biology, 030304 developmental biology, Mathematics, Discrete mathematics, De Bruijn sequence, Sequence, 0303 health sciences, De Bruijn index, education.field_of_study, Bacteria, 030306 microbiology, Eukaryota, Sequence Analysis, DNA, Data structure, Original Papers, Tree (graph theory), Graph, Computer Science Applications, Computational Mathematics, Tree traversal, 030104 developmental biology, TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES, Computational Theory and Mathematics, Colored, symbols, BEST theorem, Algorithms, Software, 030217 neurology & neurosurgery
Abstract: Motivation In 2012, Iqbal et al. introduced the colored de Bruijn graph, a variant of the classic de Bruijn graph, which is aimed at ‘detecting and genotyping simple and complex genetic variants in an individual or population’. Because they are intended to be applied to massive population level data, it is essential that the graphs be represented efficiently. Unfortunately, current succinct de Bruijn graph representations are not directly applicable to the colored de Bruijn graph, which requires additional information to be succinctly encoded as well as support for non-standard traversal operations. Results Our data structure dramatically reduces the amount of memory required to store and use the colored de Bruijn graph, with some penalty to runtime, allowing it to be applied in much larger and more ambitious sequence projects than was previously possible. Availability and Implementation https://github.com/cosmo-team/cosmo/tree/VARI Supplementary information Supplementary data are available at Bioinformatics online.
Published: 2016

11. FITTING OF HYPERELASTIC CONSTITUTIVE MODELS IN DIFFERENT SHEEP HEART REGIONS BASED ON BIAXIAL MECHANICAL PROPERTIES

Author: Fulufhelo Nemavhola, Thanyani A Pandelani, and Harry Ngwangwa
Subjects: Polynomial (hyperelastic model), medicine.medical_specialty, medicine.anatomical_structure, Myocardial tissue, Ventricle, Internal medicine, Heart failure, Hyperelastic material, Family model, medicine, Cardiology, medicine.disease, Mathematics
Abstract: Heart failure remains one of the leading causes of death especially among people over the age of 60 years worldwide. To develop effective therapy and suitable replacement materials for the heart muscle it is necessary to understand its biomechanical behaviour under load. This paper investigates the passive mechanical response of the sheep myocardia excised from three different regions of the heart. Due to the relatively higher cost and huge ethical demands in acquisition and testing of real animal heart models, this paper evaluates the fitting performances of five different constitutive models on the myocardial tissue responses. Ten sheep were sacrificed, and their hearts excised and transported within 3h to the testing biomechanical laboratory. The upper sections of the hearts above the short axes were carefully dissected out. Tissues were dissected from the mid-sections of the left ventricle, mid-wall and right ventricle for each heart. The epicardia and endocardia were then carefully sliced off each tissue to leave the myocardia. Stress-strain curves were calculated, filtered and resampled. The results show that Choi-Vito model was found to provide the best fit to the LV, the polynomial (Anisotropic) model to RV, the Four-Fiber Family model to RV, Holzapfel (2000) to RV, Holzapfel (2005) to RV and the Fung model to LV.
Published: 2021

12. Covid-19 Epidemic Prediction in France : the Multimodal Case

Author: Jean-Pierre Quadrat
Subjects: Moment (mathematics), 2019-20 coronavirus outbreak, Coronavirus disease 2019 (COVID-19), Logarithm, Differential equation, Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), Statistics, Mode (statistics), Mathematics
Abstract: In two previous papers we have proposed models to estimate the Covid-19 epidemic when the number of daily positive cases has a bell shaped form that we call a mode. We have observed that each Covid variant produces this type of epidemic shape at a different moment, resulting in a multimodal epidemic shape. We will show in this document that each mode can still be estimated with models described in the two previous papers provides we replace the cumulated number of positive cases y by the cumulated number of positive cases reduced by a parameter P to be estimated. Therefore denoting z the logarithm of y −P, z follows approximately the differential equation ż = b −azr where a, b, r have also to be estimated from the observed data. We will show the obtained predictions on the four French modes April, November 2020, May and September 2021. The comparison between the prediction obtained before the containment decisions made by the French government and the observed data afterwards suggests the inefficiency of the epidemic lockdowns.
Published: 2021

13. A Bayesian Analysis of Steady–State Enzyme Data leads to Estimates of Rate Constants and Uncertainties in a Multi-Step Reaction

Author: Ian Barr
Subjects: Hybrid Monte Carlo, Posterior probability, Bayesian probability, Kinetic isotope effect, Sampling (statistics), Statistical physics, Steady state (chemistry), Bayesian inference, Confidence interval, Mathematics
Abstract: The microscopic rate constants that govern an enzymatic reaction are only directly measured under certain experimental set-ups, such as stopped flow, quenched flow, or temperature-jump assays; the majority of enzymology proceeds from steady state conditions which leads to a set of more easily–observable parameters such as kcat, KM, and observed Kinetic Isotope Effects (Dkcat). This paper further develops a model from Toney (2013) to estimate microscopic rate constants from steady-state data for a set of reversible, four–step reactions. This paper uses the Bayesian modeling software Stan, and demonstrates the benefits of Bayesian data analysis in the estimation of these rate constants. In contrast to the optimization methods employed often in the estimation of kinetic constants, a Bayesian treatment is more equipped to estimate the uncertainties of each parameter; sampling from the posterior distribution using Hamiltonian Monte Carlo immediately gives parameter estimates as mean or median of the posterior, and also confidence intervals that express the uncertainty of each parameter.
Published: 2021

14. Modular assembly of dynamic models in systems biology

Author: Michael Pan, Edmund J. Crampin, Peter J. Gawthrop, and Joseph Cursons
Subjects: Metabolic Processes, Cell signaling, Computer science, Xenopus, Signal transduction, computer.software_genre, Infographics, Systems Science, Biochemistry, Biology (General), Post-Translational Modification, Phosphorylation, Data Management, Reusability, Ecology, Systems Biology, Physics, Signaling cascades, Software Engineering, Enzymes, Computational Theory and Mathematics, Modeling and Simulation, Physical Sciences, Thermodynamics, Engineering and Technology, Granularity, Graphs, Glycolysis, Research Article, Cell biology, Computer and Information Sciences, MAPK signaling cascades, QH301-705.5, MAP Kinase Signaling System, Systems biology, Context (language use), Models, Biological, Computer Software, Cellular and Molecular Neuroscience, Consistency (database systems), Genetics, Animals, Molecular Biology, Ecology, Evolution, Behavior and Systematics, Modularity (networks), Biology and life sciences, business.industry, Data Visualization, Proteins, Modular design, Software framework, Metabolism, Enzymology, Software engineering, business, Bond graph, computer, Mathematics
Abstract: It is widely acknowledged that the construction of large-scale dynamic models in systems biology requires complex modelling problems to be broken up into more manageable pieces. To this end, both modelling and software frameworks are required to enable modular modelling. While there has been consistent progress in the development of software tools to enhance model reusability, there has been a relative lack of consideration for how underlying biophysical principles can be applied to this space. Bond graphs combine the aspects of both modularity and physics-based modelling. In this paper, we argue that bond graphs are compatible with recent developments in modularity and abstraction in systems biology, and are thus a desirable framework for constructing large-scale models. We use two examples to illustrate the utility of bond graphs in this context: a model of a mitogen-activated protein kinase (MAPK) cascade to illustrate the reusability of modules and a model of glycolysis to illustrate the ability to modify the model granularity., Author summary The biochemistry within a cell is complex, being composed of numerous biomolecules and reactions. In order to develop fully detailed mathematical models of cells, smaller submodels need to be constructed and connected together. Software and standards can assist in this endeavour, but challenges remain in ensuring that submodels are both consistent with each other and consistent with the fundamental conservation laws of physics. In this paper, we propose a new approach using bond graphs from engineering. In this approach, connections between models are defined using physical conservation laws. We show that this approach is compatible with current software approaches in the field, and can therefore be readily used to incorporate physical consistency into existing model integration methodologies. We illustrate the utility of this approach in streamlining the development of models for a signalling network (the MAPK cascade) and a metabolic network (the glycolysis pathway). The advantage of this approach is that models can be developed in a scalable manner while also ensuring consistency with the laws of physics, enhancing the range of data available to train models. This approach can be used to quickly construct detailed and accurate models of cells, facilitating future advances in biotechnology and personalised medicine.
Published: 2021

15. Digitizing ECG image: new fully automated method and open-source software code

Author: Hetal Patel, Natalie E. Coppa, Kazi T. Haq, Julian Fortune, and Larisa G. Tereshchenko
Subjects: Ventricular gradient, Fully automated, Asynchronous communication, business.industry, Image (category theory), Code (cryptography), Pattern recognition, Open source software, Artificial intelligence, Ecg lead, business, Confidence interval, Mathematics
Abstract: BackgroundWe aimed to develop and validate an automated, open-source code ECG-digitizing tool and assess agreements of ECG measurements across three types of median beats, comprised of digitally recorded, simultaneous and asynchronous ECG leads and digitized asynchronous ECG leads.MethodsWe used the data of clinical studies participants (n=230; mean age 30±15 y; 25% female; 52% had the cardiovascular disease) with available both digitally recorded and printed on paper and then scanned ECGs, split into development (n=150) and validation (n=80) datasets. The agreement between ECG and VCG measurements on the digitally recorded time-coherent median beat, representative asynchronous digitized, and digitally recorded beats was assessed by Bland-Altman analysis.ResultsAgreement between digitally recorded and digitized representative beat was high [area spatial ventricular gradient (SVG) elevation bias 2.5(95% limits of agreement [LOA] -7.9-13.0)°; precision 96.8%; inter-class correlation [ICC] 0.988; Lin’s concordance coefficient ρc 0.97(95% confidence interval [CI] 0.95-0.98)]. Agreement between digitally recorded asynchronous and time-coherent median beats was moderate for area-based VCG metrics (spatial QRS-T angle bias 1.4(95%LOA -33.2-30.3)°; precision 94.8%; ICC 0.95; Lin’s concordance coefficient ρc 0.90(95%CI 0.82-0.95)], but poor for peak-based VCG metrics of global electrical heterogeneity.ConclusionsWe developed and validated an open-source software tool for paper-ECG digitization. Asynchronous ECG leads are the primary source of disagreement in measurements on digitally recorded and digitized ECGs.
Published: 2021

16. Adherence to minimal experimental requirements for defining extracellular vesicles and their functions: a systematic review

Author: Martin Wolf, Rodolphe Poupardin, and Dirk Strunk
Subjects: Discrete mathematics, Characterization methods, Keyword search, Research quality, Extracellular vesicle, Extracellular vesicles, Mathematics
Abstract: Rigorous measures are required to cope with the advance of extracellular vesicle (EV) research, from 183 to 2,309 studies/year, between 2012-2020. The ‘MISEV’ guidelines requested standardizing methods, thereby assuring and improving of EV research quality. We investigated how EV research improved over time.We conducted a keyword search in 5,093 accessible publications over the period 2012-2020 and analyzed the methodology used for EV isolation and characterization. We found a significant improvement over the years particularly regarding EV characterization where recent papers used a higher number of methods and EV markers to check for quantity and purity. Interestingly, we also found that EV papers using more methods and EV markers were cited more frequently cited. Papers citing MISEV criteria were more prone to use a higher number of characterization methods.We therefore established a concise checklist summarizing MISEV criteria to support EV researchers towards reaching the highest standards in the field.Abstract FigureGraphical abstract
Published: 2021

17. Correcting statistical bias in correlation-based kinship estimators

Author: Shuang Song, Siting Li, Hongyu Zhao, Wei Jiang, and Xiangyu Zhang
Subjects: Correlation coefficient, Mean squared error, Statistics, Kinship, Data analysis, Estimator, Heritability, Spurious relationship, Association mapping, Mathematics
Abstract: Accurate estimate of relatedness is important for genetic data analyses, such as association mapping and heritability estimation based on data collected from genome-wide association studies. Inaccurate relatedness estimates may lead to spurious associations and biased heritability estimations. Individual-level genotype data are often used to estimate kinship coefficient between individuals. The commonly used sample correlation-based genomic relationship matrix (scGRM) method estimates kinship coefficient by calculating the average sample correlation coefficient among all single nucleotide polymorphisms (SNPs), where the observed allele frequencies are used to calculate both the expectations and variances of genotypes. Although this method is widely used, a substantial proportion of estimated kinship coefficients are negative, which are difficult to interpret. In this paper, through mathematical derivation, we show that there indeed exists bias in the estimated kinship coefficient using the scGRM method when the observed allele frequencies are regarded as true frequencies. This leads to negative bias for the average estimate of kinship among all individuals, which explains the estimated negative kinship coefficients. Based on this observation, we propose an unbiased estimation method, UKin, which can reduce the bias. We justify our improved method with rigorous mathematical proof. We have conducted simulations as well as two real data analyses to demonstrate that both bias and root mean square error in kinship coefficient estimation can be reduced by using UKin. Further simulations indicate that the power in association mapping can also be improved by using our unbiased kinship estimates to adjust for cryptic relatedness. Author summary Inference of relatedness plays an important role in genetic data analysis. Many methods have been proposed to estimate kinship coefficients, including the commonly used genomic relationship matrix method. However, a substantial proportion of the kinship coefficients estimated by this method are negative, which is difficult to interpret. In this paper, through mathematical derivation, we show that there indeed exists a negative bias in this approach. To correct for this bias, we propose a new kinship coefficient estimation method, UKin, which is unbiased without requiring extra genetic information nor added computational complexity. The better performance of UKin in reducing bias and root mean squared error is demonstrated through theory, simulations and analysis of data from the young-onset breast cancer and familial intracranial aneurysm studies.
Published: 2021

18. Perturbations both trigger and delay seizures due to generic properties of slow-fast relaxation oscillators

Author: Jaroslav Hlinka and Alberto Pérez-Cervera
Subjects: 0301 basic medicine, Bistability, Physics::Medical Physics, Population Dynamics, Normal Distribution, Perturbation (astronomy), Systems Science, Epilepsy, 0302 clinical medicine, Phase response, Medicine and Health Sciences, Statistical physics, Biology (General), Physics, Soil Perturbation, Ecology, Electroencephalography, Dynamical Systems, Análisis combinatorio, Amplitude, Computational Theory and Mathematics, Neurology, Modeling and Simulation, Physical Sciences, Engineering and Technology, Research Article, Computer and Information Sciences, Dynamical systems theory, Medicina, QH301-705.5, Generic property, Soil Science, Context (language use), Models, Biological, 03 medical and health sciences, Cellular and Molecular Neuroscience, Seizures, Industrial Engineering, Genetics, medicine, Humans, Ictal, Control Theory, Molecular Biology, Relaxation (Physics), Ecology, Evolution, Behavior and Systematics, Quantitative Biology::Neurons and Cognition, Population Biology, Relaxation oscillator, Biology and Life Sciences, Control Engineering, Probability Theory, Probability Distribution, medicine.disease, Computing Methods, 030104 developmental biology, Earth Sciences, 030217 neurology & neurosurgery, Mathematics
Abstract: The mechanisms underlying the emergence of seizures are one of the most important unresolved issues in epilepsy research. In this paper, we study how perturbations, exogenous or endogenous, may promote or delay seizure emergence. To this aim, due to the increasingly adopted view of epileptic dynamics in terms of slow-fast systems, we perform a theoretical analysis of the phase response of a generic relaxation oscillator. As relaxation oscillators are effectively bistable systems at the fast time scale, it is intuitive that perturbations of the non-seizing state with a suitable direction and amplitude may cause an immediate transition to seizure. By contrast, and perhaps less intuitively, smaller amplitude perturbations have been found to delay the spontaneous seizure initiation. By studying the isochrons of relaxation oscillators, we show that this is a generic phenomenon, with the size of such delay depending on the slow flow component. Therefore, depending on perturbation amplitudes, frequency and timing, a train of perturbations causes an occurrence increase, decrease or complete suppression of seizures. This dependence lends itself to analysis and mechanistic understanding through methods outlined in this paper. We illustrate this methodology by computing the isochrons, phase response curves and the response to perturbations in several epileptic models possessing different slow vector fields. While our theoretical results are applicable to any planar relaxation oscillator, in the motivating context of epilepsy they elucidate mechanisms of triggering and abating seizures, thus suggesting stimulation strategies with effects ranging from mere delaying to full suppression of seizures., Author summary Despite its simplicity, the modelling of epileptic dynamics as a slow-fast transition between low and high activity states mediated by some slow feedback variable is a relatively novel albeit fruitful approach. This study is the first, to our knowledge, characterizing the response of such slow-fast models of epileptic brain to perturbations by computing its isochrons. Besides its numerical computation, we theoretically determine which factors shape the geometry of isochrons for planar slow-fast oscillators. As a consequence, we introduce a theoretical approach providing a clear understanding of the response of perturbations of slow-fast oscillators. Within the epilepsy context, this elucidates the origin of the contradictory role of interictal epileptiform discharges in the transition to seizure, manifested by both pro-convulsive and anti-convulsive effect depending on the amplitude, frequency and timing. More generally, this paper provides theoretical framework highlighting the role of the slow flow component on the response to perturbations in relaxation oscillators, pointing to the general phenomena in such slow-fast oscillators ubiquitous in biological systems.
Published: 2020

19. Single-cell data and correlation analysis support the independent double adder model in both Escherichia coli and Bacillus subtilis

Author: Suckjoon Jun, Guillaume Le Treut, Dongyang Li, and Fangwei Si
Subjects: Adder, biology, Cell division, Replication Initiation, Replication (statistics), biology.protein, Computational biology, Cell cycle, Cell Cycle Protein, FtsZ, DnaA, Mathematics
Abstract: The reference point for cell-size control in the cell cycle is a fundamental biological question. We previously reported that we were unable to reproduce the conclusions of Witz et al.’s eLife paper (Witz, van Nimwegen, and Julou 2019) entitled, “Initiation of chromosome replication controls both division and replication cycles in E. coli through a double-adder mechanism”, despite extensive efforts. In this ‘replication double adder’ (RDA) model, both replication and division cycles are determined via replication initiation as the sole implementation point of size control. Witz et al. justified the RDA model using a type of correlation analysis (the “I-value analysis”) that they developed. By contrast, we previously showed that, in both Escherichia coli and Bacillus subtilis, replication initiation and cell division are determined by balanced biosynthesis of key cell cycle proteins (e.g., DnaA for initiation and FtsZ for cell division) and their accumulation to their respective threshold numbers, which Witz et al. coined the ‘independent double adder’ (IDA) model. The adder phenotype is a natural quantitative consequence of these mechanistic principles. In a recent bioRxiv response to our report, Witz and colleagues explicitly confirmed two important limitations of the I-value analysis: (1) it is only applicable to non-overlapping cell cycles, wherein E. coli is known to deviate from the adder principle, and (2) it is only applicable to select biological models and, for example, cannot evaluate the IDA model. These limitations of the I-value analysis were not explained in the original eLife paper and were overlooked during the review process. In this report, we show using data analysis, mathematical modeling, and experiments why the I-value analysis - in its current implementation - cannot compare different biological models. Furthermore, the RDA model is incompatible with the adder principle and is not broadly supported by experimental data. For completeness, we also provide a detailed point-by-point response to Witz et al.’s response (Witz, Julou, and van Nimwegen 2020) in the Supplemental Information.
Published: 2020

20. Intra-Arterial Blood Pressure Measurement: Sources of Error and Solutions

Author: Farhan Adam Mukadam, Bowya Baskaran, Sathya Subramani, Naveen Gangadharan, C. Surekha, and Suresh R. Devasahayam
Subjects: Pressure measurement, Amplitude, Mean squared error, law, Control theory, System of measurement, Intensive care, Natural frequency, Filter (signal processing), Error detection and correction, law.invention, Mathematics
Abstract: RationaleIntra-arterial blood pressure measurement is the cornerstone of hemodynamic monitoring in Intensive Care Units (ICU). Accuracy of the measurement is dependent on the dynamic response of the measuring system, defined by its natural frequency (fnatural) and damping coefficient (Zdamping) which are estimated with a Fast-flush test. Locating the experimentally measured fnatural and Zdamping on the plot in the original paper by Gardner (1981) which defined the acceptable limits for these 2 parameters, has long been the only way to determine the accuracy of the pressure measurement.In this paper, we extend the current understanding of the effect of poor dynamic response of the measurement system, enhance the usefulness of Gardner’s plots by providing a numerical value for the error in pressure measurement (for a given set of conditions) and depict the gradation of error value as heat maps, and also demonstrate the usefulness of a tunable filter for error correction.Objectives(i) Estimation of the amplitude of error in pressure measurement through simulations based on real-world data, and development of heat-maps for easy use by physicians to assess if the recording conditions are optimal (ii) A new method to correct the error.Methods and ResultsSimulated blood pressure waveforms of various heart rates and pressure levels were passed through simulated measurement systems with varying fnatural and Zdamping. The numerical errors in systolic and diastolic pressures and mean error in the measured pressure were used to generate heat maps denoting the errors for the various recording conditions, in the same plot as that by Gardner (1981). Performance of a tunable filter to correct the error is demonstrated.ConclusionsIn many clinical settings the measurement of intra-arterial pressure is prone to significant error. The proposed tunable filter is shown to improve the accuracy of intra-arterial pressure recording.
Published: 2020

21. On Rank Deficiency in Phenotypic Covariance Matrices

Author: F. Robin O’Keefe, Julie A. Meachen, and P. David Polly
Subjects: education.field_of_study, Rank (linear algebra), Covariance matrix, Metric (mathematics), Population, Statistics, Entropy (information theory), Covariance, education, Eigenvalues and eigenvectors, Standard deviation, Mathematics
Abstract: This paper is concerned with rank deficiency in phenotypic covariance matrices: first to establish it is a problem by measuring it, and then proposing methods to treat for it. Significant rank deficiency can mislead current measures of whole-shape phenotypic integration, because they rely on eigenvalues of the covariance matrix, and highly rank deficient matrices will have a large percentage of meaningless eigenvalues. This paper has three goals. The first is to examine a typical geometric morphometric data set and establish that its covariance matrix is rank deficient. We employ the concept of information, or Shannon, entropy to demonstrate that a sample of dire wolf jaws is highly rank deficient. The different sources of rank deficiency are identified, and include the Generalized Procrustes analysis itself, use of the correlation matrix, insufficient sample size, and phenotypic covariance. Only the last of these is of biological interest.Our second goal is to examine a test case where a change in integration is known, allowing us to document how rank deficiency affects two measures of whole shape integration (eigenvalue standard deviation and standardized generalized variance). This test case utilizes the dire wolf data set from Part 1, and introduces another population that is 5000 years older. Modularity models are generated and tested for both populations, showing that one population is more integrated than the other. We demonstrate that eigenvalue variance characterizes the integration change incorrectly, while the standardized generalized variance lacks sensitivity. Both metrics are impacted by the inclusion of many small eigenvalues arising from rank deficiency of the covariance matrix. We propose a modification of the standardized generalized variance, again based on information entropy, that considers only the eigenvalues carrying non-redundant information. We demonstrate that this metric is successful in identifying the integration change in the test case.The third goal of this paper is to generalize the new metric to the case of arbitrary sample size. This is done by normalizing the new metric to the amount of information present in a permuted covariance matrix. We term the resulting metric the ‘relative dispersion’, and it is sample size corrected. As a proof of concept we us the new metric to compare the dire wolf data set from the first part of this paper to a third data set comprising jaws of Smilodon fatalis. We demonstrate that the Smilodon jaw is much more integrated than the dire wolf jaw. Finally, this information entropy-based measures of integration allows comparison of whole shape integration in dense semilandmark environments, allowing characterization of the information content of any given shape, a quantity we term ‘latent dispersion’.
Published: 2020

22. Using Hawkes Processes to model imported and local malaria cases in near-elimination settings

Author: Isobel Routledge, Samir Bhatt, Shengjie Lai, Marian-Andrei Rizoiu, Daniel J. Weiss, H. Juliette T. Unwin, Seth Flaxman, Swapnil Mishra, and Justin M. Cohen
Subjects: 0301 basic medicine, Epidemiology, Computer science, 01 natural sciences, Disease Outbreaks, law.invention, Geographical Locations, 010104 statistics & probability, Medical Conditions, law, Medicine and Health Sciences, Biology (General), Protozoans, Community based, Ecology, Statistical Models, Simulation and Modeling, Applied Mathematics, Statistics, Malarial Parasites, Eukaryota, Geography, Transmission (mechanics), Computational Theory and Mathematics, Modeling and Simulation, Physical Sciences, 01 Mathematical Sciences, 06 Biological Sciences, 08 Information and Computing Sciences, Disease transmission, Algorithms, Research Article, Optimization, China, Asia, Bioinformatics, QH301-705.5, Mosquito Vectors, Research and Analysis Methods, 03 medical and health sciences, Cellular and Molecular Neuroscience, Malaria transmission, Parasitic Diseases, Genetics, medicine, Humans, Disease Eradication, 0101 mathematics, Molecular Biology, Environmental planning, Ecology, Evolution, Behavior and Systematics, Models, Statistical, Organisms, Biology and Life Sciences, Outbreak, Statistical model, Tropical Diseases, medicine.disease, Parasitic Protozoans, Malaria, 030104 developmental biology, People and Places, Eswatini, Mathematics
Abstract: Developing new methods for modelling infectious diseases outbreaks is important for monitoring transmission and developing policy. In this paper we propose using semi-mechanistic Hawkes Processes for modelling malaria transmission in near-elimination settings. Hawkes Processes are well founded mathematical methods that enable us to combine the benefits of both statistical and mechanistic models to recreate and forecast disease transmission beyond just malaria outbreak scenarios. These methods have been successfully used in numerous applications such as social media and earthquake modelling, but are not yet widespread in epidemiology. By using domain-specific knowledge, we can both recreate transmission curves for malaria in China and Eswatini and disentangle the proportion of cases which are imported from those that are community based., Author summary This paper introduces a mathematically well-founded method for infectious disease outbreaks known as Hawkes Processes. These semi-mechanistic models are relatively new to the infectious diseases toolkit and enable us to combine disease specific information such as the infectious profile with statistical rigour to recreate temporal disease transmission. We show that these methods are very suited to modelling malaria in communities close to eliminating malaria—in particular China and Eswatini—where we are able to disentangle the contribution of exogenous (external) transmission and endogenous (person-to-person) transmission. This is particularly important for developing policies when counties are approaching elimination.
Published: 2020
Full Text: View/download PDF

23. Qualitative forecast and temporal evolution of the disease spreading using a simplified model and COVID-19 data for Italy

Author: Simeone, Roberto
Subjects: Coronavirus disease 2019 (COVID-19), Exponential growth, Disease spreading, Time evolution, Applied mathematics, Stage (hydrology), Growth rate, Diffusion (business), Free diffusion, Mathematics
Abstract: In a previous paper [1] a simplified SEIR model applied to COVID-19 cases detected in Italy, including the lockdown period, has shown a good fitting to the time evolution of the disease during the observed period.In this paper that model is applied to the initial data available for Italy in order to forecast, in a qualitative way, the time evolution of the disease spreading. The values obtained are to be considered indicative.The same model has been applied both to the data relating to Italy and to some italian regions generally finding good qualitative results.The only tuning parameter in the model is the ‘incubation period’ τ.In this modelization the tuning parameter, together with the calculated growth rate of the exponential curve used to approximate the early stage data, are in strong relationship with the compartments’ transfer rates.The relationships between the parameters simplify modeling by allowing a rough (not supported by statistical considerations) forecast of the time evolution, starting from the first period of growth of the diffusion.ConclusionsA simplified compartmental model that uses only the incubation period and the exponential growth rate as parameters is applied to the COVID-19 data for Italy in several periods of the initial growth of the diffusion showing the different stages of the spread evolution. The simplification is based on the strong protection measures that were in place in Italy during the lockdown period after the initial free diffusion.
Published: 2020

24. Who dies from COVID-19? Post-hoc explanations of mortality prediction models using coalitional game theory, surrogate trees, and partial dependence plots

Author: Yang, Russell
Subjects: Chronic disease, Coronavirus disease 2019 (COVID-19), Post hoc, Ranking, Statistics, Decision rule, Mortality prediction, Game theory, Mathematics
Abstract: As of early June, 2020, approximately 7 million COVID-19 cases and 400,000 deaths have been reported. This paper examines four demographic and clinical factors (age, time to hospital, presence of chronic disease, and sex) and utilizes Shapley values from coalitional game theory and machine learning to evaluate their relative importance in predicting COVID-19 mortality. The analyses suggest that out of the 4 factors studied, age is the most important in predicting COVID-19 mortality, followed by time to hospital. Sex and presence of chronic disease were both found to be relatively unimportant, and the two global interpretation techniques differed in ranking them. Additionally, this paper creates partial dependence plots to determine and visualize the marginal effect of each factor on COVID-19 mortality and demonstrates how local interpretation of COVID-19 mortality prediction can be applicable in a clinical setting. Lastly, this paper derives clinically applicable decision rules about mortality probabilities through a parsimonious 3-split surrogate tree, demonstrating that high-accuracy COVID-19 mortality prediction can be achieved with simple, interpretable models.
Published: 2020

25. A novel deterministic forecast model for COVID-19 epidemic based on a single ordinary integro-differential equation

Author: Koehler-Rieper, Felix, Roehl, Claudius H. F., and De Micheli, Enrico
Subjects: Prognostic variable, Coronavirus disease 2019 (COVID-19), Reliability (computer networking), 010102 general mathematics, Complex system, COVID-19, General Physics and Astronomy, Sample (statistics), Regular Article, deterministic model, 01 natural sciences, Model dynamics, 010101 applied mathematics, Integro-differential equation, Single equation, Applied mathematics, 0101 mathematics, Focus (optics), Mathematics, Numerical stability, Variable (mathematics)
Abstract: In this paper we present a new approach to deterministic modelling of COVID-19 epidemic. Our model dynamics is expressed by a single prognostic variable which satisfies an integro-differential equation. All unknown parameters are described with a single, time-dependent variable R(t). We show that our model has similarities to classic compartmental models, such as SIR, and that the variable R(t) can be interpreted as a generalized effective reproduction number. The advantages of our approach are the simplicity of having only one equation, the numerical stability due to an integral formulation and the reliability since the model is formulated in terms of the most trustable statistical data variable: the number of cumulative diagnosed positive cases of COVID-19. Once this dynamic variable is calculated, other non-dynamic variables, such as the number of heavy cases (hospital beds), the number of intensive-care cases (ICUs) and the fatalities, can be derived from it using a similarly stable, integral approach. The formulation with a single equation allows us to calculate from real data the values of the sample effective reproduction number, which can then be fitted. Extrapolated values of R(t) can be used in the model to make reliable forecasts, though under the assumption that measures for reducing infections are maintained. We have applied our model to more than 15 countries and the ongoing results are available on a web-based platform [1]. In this paper, we focus on the data for two exemplary countries, Italy and Germany, and show that the model is capable of reproducing the course of the epidemic in the past and forecasting its course for a period of four to five weeks with a reasonable numerical stability.
Published: 2020

26. Research Article Summary: Inferring change points in the COVID-19 spreading reveals the effectiveness of interventions

Author: Viola Priesemann, F. Paul Spitzner, Michael Wilczek, Michael Wibral, Joao Pinheiro Neto, Johannes Zierenberg, and Jonas Dehning
Subjects: Exponential growth, Coronavirus disease 2019 (COVID-19), Social distance, Psychological intervention, Change points, Econometrics, Credible interval, Inference, Bayesian inference, Mathematics
Abstract: As COVID-19 is rapidly spreading across the globe, short-term modeling forecasts provide time-critical information for decisions on containment and mitigation strategies. A main challenge for short-term forecasts is the assessment of key epidemiological parameters and how they change when first interventions show an effect. By combining an established epidemiological model with Bayesian inference, we analyze the time dependence of the effective growth rate of new infections. Focusing on the COVID-19 spread in Germany, we detect change points in the effective growth rate that correlate well with the times of publicly announced interventions. Thereby, we can quantify the effect of interventions, and we can incorporate the corresponding change points into forecasts of future scenarios and case numbers. Our code is freely available and can be readily adapted to any country or region.IntroductionWhen faced with the outbreak of a novel epidemic like COVID-19, rapid response measures are required by individuals as well as by society as a whole to mitigate the spread of the virus. During this initial, time-critical period, neither the central epidemiological parameters, nor the effectiveness of interventions like cancellation of public events, school closings, and social distancing are known.RationaleAs one of the key epidemiological parameters, we infer the spreading rateλfrom confirmed COVID-19 case numbers at the example of Germany by combining Bayesian inference based on Markov-Chain Monte-Carlo sampling with a class of SIR (Susceptible-Infected-Recovered) compartmental models from epidemiology. Our analysis characterizes the temporal change of the spreading rate and, importantly, allows us to identify potential change points and to provide short-term forecast scenarios based on various degrees of social distancing. A detailed description is provided in the accompanying paper, and the models, inference, and predictions are available ongithub. While we apply it to Germany, our approach can be readily adapted to other countries or regions.ResultsIn Germany, interventions to contain the outbreak were implemented in three steps over three weeks: Around March 9, large public events like soccer matches were cancelled. On March 16, schools and childcare facilities as well as many non-essential stores were closed. One week later, on March 23, a far-reaching contact ban (“Kontaktsperre”), which included the prohibition of even small public gatherings as well as the further closing of restaurants and non-essential stores, was imposed by the government authorities.From the observed case numbers of COVID-19, we can quantify the impact of these measures on the disease spread (Fig. 0). Based on our analysis, which includes data until April 21, we have evidence of three change points: the first changed the spreading rate fromλ0= 0.43 (95 % credible interval (CI: [0.35, 0.51])) toλ1= 0.25 (CI: [0.20, 0.30]), and occurred around March 6 (CI: March 2 to March 9); the second change point resulted inλ2= 0.15 (CI: [0.12, 0.20]), and occurred around March 15 (CI: March 13 to March 17). Both changes inλslowed the spread of the virus, but still implied exponential growth (Fig. 0, red and orange traces). To contain the disease spread, and turn from exponential growth to a decline of new cases, a further decrease inλwas necessary. Our analysis shows that this transition has been reached by the third change point that resulted inλ3= 0.09 (CI: [0.06, 0.12]) around March 23 (CI: March 20 to March 25).With this third change point,λtransitioned below the critical value where the spreading rateλbalances the recovery rateμ, i.e. the effective growth rateλ* =λ−μ≈ 0 (Fig. 0, gray traces). Importantly,λ* = 0 presents the watershed between exponential growth or decay. Given the delay of approximately two weeks between an intervention and first inference of the induced changes inλ*, future interventions such as lifting restrictions warrant careful consideration.Our detailed analysis shows that,in the current phase,reliable short- and long-term forecasts are very difficult as they critically hinge on how the epidemiological parameters change in response to interventions: In Fig. 0 already the three example scenarios quickly diverge from each other, and consequently span a considerable range of future case numbers. Thus, any uncertainty on the magnitude of our social distancing in the past two weeks can have a major impact on the case numbers in the next two weeks. Beyond two weeks, the case numbers depend on our future behavior, for which we have to make explicit assumptions. In the main paper we illustrate how the precise magnitude and timing of potential change points impact the forecast of case numbers (Fig. 2).ConclusionsWe developed a Bayesian framework to infer central epidemiological parameters and the timing and magnitude of intervention effects. Thereby, the efficiency of political and individual intervention measures for social distancing and containment can be assessed in a timely manner. We find evidence for a successive decrease of the spreading rate in Germany around March 6 and around March 15, which significantly reduced the magnitude of exponential growth, but was not sufficient to turn growth into decay. Our analysis also shows that a further decrease of the spreading rate occurred around March 23, turning exponential growth into decay. Future interventions and lifting of restrictions can be modeled as additional change points, enabling short-term forecasts for case numbers. In general, our analysis code may help to infer the efficiency of measures taken in other countries and inform policy makers about tightening, loosening and selecting appropriate rules for containment.
Published: 2020

27. Basic prediction methodology for covid-19: estimation and sensitivity considerations

Author: Tom Britton
Subjects: Estimation, Important conclusion, Coronavirus disease 2019 (COVID-19), Prediction methods, Statistics, Range (statistics), Cumulative incidence, Sensitivity (control systems), Mathematics
Abstract: SummaryThe purpose of the present paper is to present simple estimation and prediction methods for basic quantities in an emerging epidemic like the ongoing covid-10 pandemic. The simple methods have the advantage that relations between basic quantities become more transparent, thus shedding light to which quantities have biggest impact on predictions, with the additional conclusion that uncertainties in these quantities carry over to high uncertainty also in predictions.A simple non-parametric prediction method for future cumulative case fatalities, as well as future cumulative incidence of infections (assuming a given infection fatality risk f), is presented. The method uses cumulative reported case fatalities up to present time as input data. It is also described how the introduction of preventive measures of a given magnitude ρ will affect the two incidence predictions, using basic theory of epidemic models. This methodology is then reversed, thus enabling estimation of the preventive magnitude ρ, and of the resulting effective reproduction number RE. However, the effects of preventive measures only start affecting case fatalities some 3-4 weeks later, so estimates are only available after this time has elapsed. The methodology is applicable in the early stage of an outbreak, before, say, 10% of the community have been infected.Beside giving simple estimation and prediction tools for an ongoing epidemic, another important conclusion lies in the observation that the two quantities f (infection fatality risk) and ρ (the magnitude of preventive measures) have very big impact on predictions. Further, both of these quantities currently have very high uncertainty: current estimates of f lie in the range 0.2% up to 2% ([9], [7]), and the overall effect of several combined preventive measures is clearly very uncertain.The two main findings from the paper are hence that, a) any prediction containing f, and/or some preventive measures, contain a large amount of uncertainty (which is usually not acknowledged well enough), and b) obtaining more accurate estimates of in particular f, should be highly prioritized. Seroprevalence testing of random samples in a community where the epidemic has ended are urgently needed.
Published: 2020
Full Text: View/download PDF

28. The reproductive number R0 of COVID-19 based on estimate of a statistical time delay dynamical system

Author: Wenbin Chen, Jin Cheng, and Nian Shao
Subjects: Coronavirus disease 2019 (COVID-19), Distribution (number theory), Applied mathematics, Interval (mathematics), Growth rate, Dynamical system, Mathematics
Abstract: In this paper, we estimate the reproductive number R0 of COVID-19 based on Wallinga and Lipsitch framework [11] and a novel statistical time delay dynamic system. We use the observed data reported in CCDC’s paper to estimate distribution of the generation interval of the infection and apply the simulation results from the time delay dynamic system as well as released data from CCDC to fit the growth rate. The conclusion is: Based our Fudan-CCDC model, the growth rate r of COVID-19 is almost in [0.30, 0.32] which is larger than the growth rate 0.1 estimated by CCDC [9], and the reproductive number R0 of COVID-19 is estimated by 3.25 ≤ R0 ≤ 3.4 if we simply use R = 1 + r ∗ Tc with Tc = 7.5, which is bigger than that of SARS. Some evolutions and predictions are listed.
Published: 2020

29. Mechanical model of muscle contraction. 6. Calculations of the tension exerted by a skeletal fiber during a shortening staircase

Author: Sylvain Louvet
Subjects: Myosin head, Tension (physics), Fiber (mathematics), Orientation (geometry), Mathematical analysis, Line (geometry), Head (vessel), Kinematics, Constant (mathematics), Mathematics
Abstract: Accompanying Paper 1 tests a theoretical relationship between force and shortening velocity of a muscle fiber without justifying its validity. Paper 2 determines the kinematics and dynamics of a myosin II head during the working stroke (WS). Paper 3 imposes the Uniform law as a density representative of the orientation of the levers belonging to the WS heads. By support of these works, Papers 4 and 5 put into equation the evolution of the tension during the four phases of a length step. The present paper closes all six articles by imposing two tasks on itself. The first purpose is to apply the theoretical elements developed for a length step to a succession of identical length steps, otherwise known as shortening staircase. With the values of the geometric and temporal parameters assigned to a myosin head in Papers 1 to 5, a correct adjustment is established between the theoretical tension deduced from our model and the experimental tension published in 1997 by a team of Italian researchers relating to nine shortening staircases performed on the same fiber. In particular, we obtain the equation of the tension reached at the time end of the step (T*) which remains constant step by step as soon as the shortening of a half-sarcomere exceeds 17 nm. The second objective is to find and explain the equation of the Force-Velocity curve introduced ex abrupto into Paper 1: by decreasing the size and duration of the steps, the staircase tends towards a constant slope line corresponding to a continuous speed shortening. By applying the methods of infinitesimal calculus to the different formulations leading to T*, we deduce the Force-Velocity relationship (see Supplement S6.L). And the circle is complete.
Published: 2019

30. Thermodynamic Measures of Human Brain Development from Fetal Stage to Adulthood

Author: Jack A. Tuszynski, Hava T. Siegelmann, Edward A. Rietman, Marco Cavaglià, and Taylor S
Subjects: 0303 health sciences, Human brain, Measure (mathematics), Protein expression, Gibbs free energy, 03 medical and health sciences, symbols.namesake, 0302 clinical medicine, Fetal Stage, medicine.anatomical_structure, Statistics, symbols, medicine, 030217 neurology & neurosurgery, 030304 developmental biology, Mathematics
Abstract: This paper analyzes the data obtained from tissue samples of the human brains containing protein expression values. The data have been processed for their thermodynamic measure in terms of the Gibbs free energy of the corresponding protein-protein interaction networks. We have investigated the functional dependence of the Gibbs free energies on age and found consistent trends for most of the 16 main brain areas. The peak of the Gibbs energy values is found at birth with a trend toward plateauing at the age of maturity. We have also compared the data for males and females and uncovered functional differences for some of the brain regions.Author SummaryIn this paper we briefly outline the theoretical basis for a novel analysis of brain development in terms of a thermodynamic measure (Gibbs free energy) for the corresponding protein-protein interaction networks. We analyzed the overall developmental patterns for Gibbs free energy as a function of age across all brain regions. Of particular note was the significant upward trend in the fetal stages, which is generally followed by a sharp dip at birth and a plateau at maturity. We then compared the trends for female and male samples. A crossover pattern was observed for most of the brain regions, where the Gibbs free energy of the male samples were lower than the female samples at prenatal and neonatal ages, but higher at ages 8-40 finally converging at late adulthood.
Published: 2019
Full Text: View/download PDF

31. Experimental study of Young modulus of Attacus atlas, Vespa crabro, and Libellula depressa wings

Author: Michał Landowski, Zuzanna Kunicka-Kowalska, and Krzysztof Sibilski
Subjects: Wing, biology, Humidity, Young's modulus, Geometry, Hymenoptera, Bending, Libellula depressa, Moment of inertia, biology.organism_classification, symbols.namesake, symbols, Attacus atlas, Mathematics
Abstract: This paper describes a scientific research aimed at obtaining data for determining Young modulus of the wings of selected insects’ species. A small testing machine intended for three-point bending and equipped with instruments registering low forces was constructed for the needs of the experiment. The machine was used to perform numerous bending tests of wings of three species of insects (obtained from a breeding farm): Attacus atlas, Vespa crabro, Libellula depressa in various air-humidity conditions. Values of the force and displacement obtained in the course of the tests were used to calculate Young modulus. In order to do so, it was also necessary to obtain the moment of inertia of the wing cross-section. These values were measured on the basis of the images obtained with a SEM microscope. Obtained results were averaged and presented with a breakdown by air-humidity conditions. It was observed that Young modulus decreased with an increase of humidity; hence the calculations of the percentage decrease of this mechanical parameter were performed. Obtained results were compared with the observed structure which was also presented under light microscope. It transpired that the construction of a wing does not only influence the mechanical values but also it influences their susceptibility to the changes occurring in the environment. Thereby, differences between Lepidoptera and Hymenoptera insects were indicated also within the aspect discussed in this paper.
Published: 2018

32. An ODE-based mixed modelling approach for B- and T-cell dynamics induced by Varicella-Zoster Virus vaccines in adults shows higher T-cell proliferation with Shingrix compared to Varilrix

Author: Nina Keersmaekers, Philippe Beutels, Benson Ogunjimi, Niel Hens, and Pierre Van Damme
Subjects: Mixed model, 0303 health sciences, Herpes Zoster Vaccine, Vaccination schedule, Immunogenicity, Varicella zoster virus, Ode, Computational biology, medicine.disease_cause, Random effects model, 3. Good health, Vaccination, 03 medical and health sciences, 0302 clinical medicine, medicine, 030212 general & internal medicine, 030304 developmental biology, Mathematics
Abstract: Clinical trials covering the immunogenicity of a vaccine aim to study the longitudinal dynamics of certain immune cells after vaccination. The corresponding immunogenicity datasets are mainly analyzed by the use of statistical (mixed effects) models. This paper proposes the use of mathematical ordinary differential equation (ODE) models, combined with a mixed effects approach. ODE models are capable of translating underlying immunological post vaccination processes into mathematical formulas thereby enabling a testable data analysis. Mixed models include both population-averaged parameters (fixed effects) and individual-specific parameters (random effects) for dealing with inter-and intra-individual variability, respectively.This paper models B-cell and T-cell datasets of a phase I/II, open-label, randomized, parallel-group study in which the immunogenicity of a new Herpes Zoster vaccine (Shingrix) is compared with the original Varicella Zoster Virus vaccine (Varilrix).Since few significant correlations were assessed between the B-cell datasets and T-cell datasets, each dataset was modeled separately. By following a general approach to both the formulation of several different models and the procedure of selecting the most suitable model, we were able propose a mathematical ODE mixed-effects model for each dataset. As such, the use of ODE-based mixed effects models offers a suitable framework for handling longitudinal vaccine immunogenicity data. Moreover, it is possible to test differences in immunological processes between the two vaccines.We found that the Shingrix vaccination schedule led to a more pronounced proliferation of T-cells, without a difference in T-cell decay rate compared to the Varilrix vaccination schedule.Author summaryUpon vaccination, B-cells and T-cells are activated to induce an immune response against the vaccine antigen at hand. In this paper, we study and compare the longitudinal dynamics of the specific immune response based on a vaccine trial in which the immunogenicity of a new Herpes Zoster vaccine (Shingrix) is compared with the original Varicella Zoster Virus vaccine (Varilrix). We combine the use of ordinary differential equations (ODEs), i.e. mathematical models which are used to describe the dynamics of the immune response, with advanced regression analyses enabling us to infer the model parameters describing these dynamics. The resulting ODE-based mixed effects models enable describing the immune response dynamics allowing for both inter-and intra-individual variability; comparing the dynamics induced by the two vaccines and studying the B-and T-cell interactions. We found a more pronounced proliferation of T-cells for the Shingrix vaccination schedule as compared to the Varilrix vaccination schedule. The proposed methodology offers a suitable framework for better understanding the immunogenicity of vaccines.
Published: 2018
Full Text: View/download PDF

33. Inferring the ancestry of parents and grandparents from genetic data

Author: Rasmus Nielsen, Yufeng Wu, Yiming Zhang, Jingwen Pei, and Kosakovsky Pond, Sergei L
Subjects: 0301 basic medicine, Parents, Heredity, Single Nucleotide Polymorphisms, Markov models, Inference, Population genetics, WHOLE-GENOME ASSOCIATION, Genome, Mathematical Sciences, 0302 clinical medicine, Databases, Genetic, LOCAL-ANCESTRY, Hidden Markov models, Biology (General), Hidden Markov model, Likelihood Functions, 0303 health sciences, Ecology, Simulation and Modeling, Software Engineering, Sampling (statistics), Grandparent, Genomics, Biological Sciences, Markov Chains, Pedigree, ADMIXTURE, Physical sciences, Genetic Mapping, Computational Theory and Mathematics, Modeling and Simulation, Engineering and Technology, Research Article, Computer and Information Sciences, QH301-705.5, Bioinformatics, Population, Biology, Markov model, Research and Analysis Methods, Cellular and Molecular Neuroscience, Databases, 03 medical and health sciences, Genetic, Information and Computing Sciences, Genetics, Humans, 1000 Genomes Project, Molecular Biology, Preprocessing, Ecology, Evolution, Behavior and Systematics, 030304 developmental biology, Evolutionary Biology, Population Biology, Human Genome, Biology and Life Sciences, Genetic data, Probability theory, Grandparents, 030104 developmental biology, Genetics, Population, Haplotypes, Evolutionary biology, INFERENCE, Mathematics, Population Genetics, 030217 neurology & neurosurgery
Abstract: Inference of admixture proportions is a classical statistical problem in population genetics. Standard methods implicitly assume that both parents of an individual have the same admixture fraction. However, this is rarely the case in real data. In this paper we show that the distribution of admixture tract lengths in a genome contains information about the admixture proportions of the ancestors of an individual. We develop a Hidden Markov Model (HMM) framework for estimating the admixture proportions of the immediate ancestors of an individual, i.e. a type of decomposition of an individual’s admixture proportions into further subsets of ancestral proportions in the ancestors. Based on a genealogical model for admixture tracts, we develop an efficient algorithm for computing the sampling probability of the genome from a single individual, as a function of the admixture proportions of the ancestors of this individual. This allows us to perform probabilistic inference of admixture proportions of ancestors only using the genome of an extant individual. We perform extensive simulations to quantify the error in the estimation of ancestral admixture proportions under various conditions. To illustrate the utility of the method, we apply it to real genetic data., Author summary Ancestry inference is an important problem in genetics and is used commercially by a number of companies affecting millions of consumers of genetic ancestry tests. In this paper, we show that it is possible, not only to estimate the ancestry fractions of an individual, but also, with some uncertainty, to estimate the ancestry fractions of an individual’s recent ancestors. For example, if an individual traces his/her ancestry 50% to Asia and 50% to Europe, it is possible to distinguish between the individual having two parents that each are 50:50 composites of Asian and European ancestry, or one parent from Asia and one from Europe. It is likewise also possible to make inferences about grandparents. We present a computationally efficient method for making such inferences called PedMix. PedMix is based on a probabilistic model for the descendant and the recent ancestors. PedMix infers admixture proportions of recent ancestors (parents, grandparents or even great grandparents) using whole-genome genetic variation data from a focal individual. Results on both simulated and real data show that PedMix performs reasonably well in most scenarios.
Published: 2018

34. Extrapolating Weak Selection in Evolutionary Games

Author: Zhuoqun Wang and Rick Durrett
Subjects: Computation, media_common.quotation_subject, 01 natural sciences, 010305 fluids & plasmas, 03 medical and health sciences, Game Theory, 0103 physical sciences, Order (group theory), Applied mathematics, Computer Simulation, Point (geometry), Limit (mathematics), Selection, Genetic, Selection (genetic algorithm), Probability, 030304 developmental biology, Mathematics, media_common, Discrete mathematics, 0303 health sciences, Models, Genetic, Applied Mathematics, Computational Biology, Mathematical Concepts, Infinity, Biological Evolution, Agricultural and Biological Sciences (miscellaneous), Markov Chains, Genetics, Population, Ranking, Modeling and Simulation, Mutation, Mutation (genetic algorithm)
Abstract: This work is inspired by a 2013 paper from Arne Traulsen’s lab at the Max Plank Institute for Evolutionary Biology [10]. They studied the small mutation limit of evolutionary games. It has been shown that for 2×2 games the ranking of the strategies does not change as strength of selection is increased [11]. The point of the 2013 paper is that when there are three or more strategies the ordering can change as selection is increased. Wu et al [10] did numerical computations for fixed N. Here, we will instead let the strength of selection β = c/N and let N → ∞ to obtain formulas for the invadability probabilities ϕij that determine the rankings. These formulas, which are integrals on [0, 1], are intractable calculus problems but can be easily evaluated numerically. Here, we concentrate on simple formulas for the ranking order when c is small or c is large.
Published: 2018

35. No cause for pause: new analyses of ramping and stepping dynamics in LIP (Rebuttal to Response to Reply to Comment on Latimer et al 2015)

Author: Jonathan W. Pillow, Kenneth W. Latimer, and Alexander C. Huk
Subjects: education.field_of_study, business.industry, Population, Rebuttal, Covariance, Upper and lower bounds, Moment (mathematics), Dynamics (music), Spike (software development), Fraction (mathematics), Artificial intelligence, education, business, Algorithm, Mathematics
Abstract: We recently presented a statistical comparison between two models of latent dynamics in macaque lateral intraparietal (LIP) area spike trains—a continuous ‘ramping’ (diffusion-to-bound) model, and a discrete ‘stepping’ model—and found that a substantial fraction of neurons (recorded in two different studies) were better supported by the stepping model (Latimer et al., 2015). Here, we respond to a recent challenge to the validity of these findings that focuses primarily on the possibility of a lower bound on LIP firing rates (Zylberberg & Shadlen, 2016). The paper in question proposed alternate formulations of the ramping model, and argued (via indirect analyses) that half the neurons in the population were better explained by the new model; if correct, this would lead to an even split in the number of neurons better explained by each model. These analyses, while interesting, do not alter the conclusions of our original paper. Here, we review the criticisms raised by Zylberberg & Shadlen and report several new analyses using models with lower bounds. First, we show that the stepping model continued to provide a better description of LIP spike trains when fit using only an early period of each trial. Second, we performed a direct model comparison between our stepping model and a ramping-with-baseline model proposed by Zylberberg & Shadlen; we found that (in a pleasing moment of agreement) roughly half the neurons were better explained by each model. Interestingly, inspection of the cells that switched classifications revealed that many did not strictly exhibit the classical ramping PSTHs that motivated these analyses in the first place. We also examined two other issues raised in recent discussions of LIP: (1) We show that a non-integrating model is consistent with some core aspects of behavioral data previously offered as evidence for continuous integration; and (2) We examine analyses based on the response covariance (“CorCE”), and show that it does not reliably distinguish ramping and stepping dynamics for our dataset. Taken together, these discussions highlight the value of data-driven characterizations of both neural and behavioral dynamics with appropriate statistical tools.
Published: 2017

36. Relative Citation Ratio (RCR): A new metric that uses citation rates to measure influence at the article level

Author: George M. Santangelo, B. Ian Hutchins, James M. Anderson, and Xin Yuan
Subjects: 0301 basic medicine, Computer science, Biochemistry, Scientific productivity, Mathematical and Statistical Techniques, Citation analysis, Biology (General), GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries), General Neuroscience, Genomics, Research Assessment, Professions, Medical Microbiology, Physical Sciences, Meta-Research Article, Regression Analysis, Metric (unit), General Agricultural and Biological Sciences, Statistics (Mathematics), Statistical Distributions, Network analysis, QH301-705.5, Science Policy, Research Grants, Microbial Genomics, Linear Regression Analysis, Biology, Bibliometrics, Research and Analysis Methods, Research Funding, Microbiology, General Biochemistry, Genetics and Molecular Biology, 03 medical and health sciences, Genetics, Statistical Methods, Measure (data warehouse), Information retrieval, General Immunology and Microbiology, Biology and Life Sciences, Probability Theory, Field (geography), Subject-matter expert, 030104 developmental biology, People and Places, Scientists, Population Groupings, Microbiome, Citation, Mathematics
Abstract: Despite their recognized limitations, bibliometric assessments of scientific productivity have been widely adopted. We describe here an improved method to quantify the influence of a research article by making novel use of its co-citation network to field-normalize the number of citations it has received. Article citation rates are divided by an expected citation rate that is derived from performance of articles in the same field and benchmarked to a peer comparison group. The resulting Relative Citation Ratio is article level and field independent and provides an alternative to the invalid practice of using journal impact factors to identify influential papers. To illustrate one application of our method, we analyzed 88,835 articles published between 2003 and 2010 and found that the National Institutes of Health awardees who authored those papers occupy relatively stable positions of influence across all disciplines. We demonstrate that the values generated by this method strongly correlate with the opinions of subject matter experts in biomedical research and suggest that the same approach should be generally applicable to articles published in all areas of science. A beta version of iCite, our web tool for calculating Relative Citation Ratios of articles listed in PubMed, is available at https://icite.od.nih.gov., A new article-level metric, the Relative Citation Ratio, provides an alternative to the use of journal impact factors as a means of identifying influential papers., Author Summary Academic researchers convey their discoveries to the scientific community by publishing papers in scholarly journals. In the biomedical sciences alone, this process now generates more than one million new reports each year. The sheer volume of available information, together with the increasing specialization of many scientists, has contributed to the adoption of metrics, including journal impact factor and h-index, as signifiers of a researcher’s productivity or the significance of his or her work. Scientists and administrators agree that the use of these metrics is problematic, but in spite of this strong consensus, such judgments remain common practice, suggesting the need for a valid alternative. We describe here an improved method to quantify the influence of a research article by making novel use of its co-citation network—that is, the other papers that appear alongside it in reference lists—to field-normalize the number of times it has been cited, generating a Relative Citation Ratio (RCR). Since choosing to cite is the long-standing way in which scholars acknowledge the relevance of each other’s work, RCR can provide valuable supplemental information, either to decision makers at funding agencies or to others who seek to understand the relative outcomes of different groups of research investments.
Published: 2015

37. Perturbative formulation of general continuous-time Markov model of sequence evolution via insertions/deletions, Part III: Algorithm for first approximation

Author: Dan Graur, Giddy Landan, and Kiyoshi Ezawa
Subjects: Part iii, Gapless playback, Phylogenetic tree, Ab initio, Perturbation (astronomy), Pairwise comparison, Markov model, Indel, Algorithm, Mathematics
Abstract: BackgroundInsertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. In a separate paper (Ezawa, Graur and Landan 2015a), we established anab initioperturbative formulation of a continuous-time Markov model of the evolution of anentiresequence via insertions and deletions. And we showed that, under a certain set of conditions, theab initioprobability of an alignment can be factorized into the product of an overall factor and contributions from regions (or local alignments) separated by gapless columns. Moreover, in another separate paper (Ezawa, Graur and Landan 2015b), we performed concrete perturbation analyses on all types of local pairwise alignments (PWAs) and some typical types of local multiple sequence alignments (MSAs). The analyses indicated that even the fewest-indel terms alone can quite accurately approximate the probabilities of local alignments, as long as the segments and the branches in the tree are of modest lengths.ResultsTo examine whether or not the fewest-indel terms alone can well approximate the alignment probabilities of more general types of local MSAs as well, and as a first step toward the automatic application of ourab initioperturbative formulation, we developed an algorithm that calculates the first approximation of the probability of a given MSA under a given parameter setting including a phylogenetic tree. The algorithm first chops the MSA into gapped and gapless segments, second enumerates all parsimonious indel histories potentially responsible for each gapped segment, and finally calculates their contributions to the MSA probability. We performed validation analyses using more than ten million local MSAs. The results indicated that even the first approximation can quite accurately estimate the probability of each local MSA, as long as the gaps and tree branches are at most moderately long.ConclusionsThe newly developed algorithm, called LOLIPOG, brought ourab initioperturbation formulation at least one step closer to a practically useful method to quite accurately calculate the probability of a MSA under a given biologically realistic parameter setting.[This paper and three other papers (Ezawa, Graur and Landan 2015a,b,c) describe a series of our efforts to develop, apply, and extend theab initioperturbative formulation of a general continuous-time Markov model of indels.]List of abbreviationsHMMhidden Markov modelindelinsertion/deletionLHSlocal history setMSAmultiple sequence alignmentPASpreserved ancestral sitePWApairwise alignment
Published: 2015
Full Text: View/download PDF

38. Perturbative formulation of general continuous-time Markov model of sequence evolution via insertions/deletions, Part II: Perturbation analyses

Author: Giddy Landan, Kiyoshi Ezawa, and Dan Graur
Subjects: Smith–Waterman algorithm, Mathematical optimization, Ab initio, Continuous time markov model, Insertion deletion, Perturbation (astronomy), Pairwise comparison, Statistical physics, Indel, Markov model, Mathematics
Abstract: BackgroundInsertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. In a separate paper (Ezawa, Graur and Landan 2015a), we established a theoretical basis of our ab initio perturbative formulation of a genuine evolutionary model, more specifically, a continuous-time Markov model of the evolution of an entire sequence via insertions and deletions. And we showed that, under some conditions, the ab initio probability of an alignment can be factorized into the product of an overall factor and contributions from regions (or local alignments) separated by gapless columns.ResultsThis paper describes how our ab initio perturbative formulation can be concretely used to approximately calculate the probabilities of all types of local pairwise alignments (PWAs) and some typical types of local multiple sequence alignments (MSAs). For each local alignment type, we calculated the fewest-indel contribution and the next-fewest-indel contribution to its probability, and we compared them under various conditions. We also derived a system of integral equations that can be numerically solved to give “exact solutions” for some common types of local PWAs. And we compared the obtained “exact solutions” with the fewest-indel contributions. The results indicated that even the fewest-indel terms alone can quite accurately approximate the probabilities of local alignments, as long as the segments and the branches in the tree are of modest lengths. Moreover, in the light of our formulation, we examined parameter regions where other indel models can safely approximate the correct evolutionary probabilities. The analyses also suggested some modifications necessary for these models to improve the accuracy of their probability estimations.ConclusionsAt least under modest conditions, our ab initio perturbative formulation can quite accurately calculate alignment probabilities under biologically realistic indel models. It also provides a sound reference point that other indel models can be compared to. [This paper and three other papers (Ezawa, Graur and Landan 2015a,b,c) describe a series of our efforts to develop, apply, and extend the ab initio perturbative formulation of a general continuous-time Markov model of indels.]
Published: 2015
Full Text: View/download PDF

39. A generalized distribution interpolated between the exponential and power law distributions and applied to the walking data of the pill bug (Armadillidium vulgare)

Author: Shuji Shinohara, Hiroshi Okamoto, Takaharu Shokaku, Akika Utsumi, Ung-il Chung, Yoshihiro Nakajima, and Toru Moriyama
Subjects: symbols.namesake, Exponential distribution, Lévy flight, Mathematical analysis, symbols, Pareto distribution, Frequency distribution, Power law, Generalized normal distribution, Shape parameter, Mathematics, Exponential function
Abstract: To determine whether the walking pattern of an organism is a Lévy walk or a Brownian walk, it has been compared whether the frequency distribution of linear step lengths follows a power law distribution or an exponential distribution. However, there are many cases where actual data cannot be classified into either of these categories. In this paper, we propose a general distribution that includes the power law and exponential distributions as special cases. This distribution has two parameters: One represents the exponent, similar to the power law and exponential distributions, and the other is a shape parameter representing the shape of the distribution. By introducing this distribution, an intermediate distribution model can be interpolated between the power law and exponential distributions. In this study, the proposed distribution was fitted to the frequency distribution of the step length calculated from the walking data of pill bugs. The autocorrelation coefficients were also calculated from the time-series data of the step length, and the relationship between the shape parameter and time dependency was investigated. The results showed that individuals whose step-length frequency distributions were closer to the power law distribution had stronger time dependence.
Published: 2021

40. Is it time to use machine learning survival algorithms for survival and risk factors prediction instead of Cox proportional hazard regression? A comparative population-based study

Author: Le-Dong Nhat-Nam, Nguyen Tran Minh Duc, Abdelrahman M Makram, Nguyen Tien Huy, Sara Morsy, Osama Gamal Hassan, Ahmad Helmy Zayan, and Truong Hong Hieu
Subjects: Cart, Proportional hazards model, business.industry, Statistical model, Machine learning, computer.software_genre, Medical statistics, Regression, Brier score, Covariate, Artificial intelligence, business, computer, Algorithm, Predictive modelling, Mathematics
Abstract: PurposeApplying machine learning in medical statistics offers more accurate prediction models. In this paper, we aimed to compare the performance of the Cox Proportional Hazard model (CPH), Classification and Regression Trees (CART), and Random Survival Forest (RSF) in short-, and long-term prediction in glioblastoma patients.MethodsWe extracted glioblastoma cancer data from the Surveillance, Epidemiology, and End Results database (SEER). We used the CPH, CART, and RSF for the prediction of 1- to 10-year survival probabilities. The Brier Score for each duration was calculated, and the model with the least score was considered the most accurate.ResultsThe cohort included 26473 glioblastoma patients divided into two groups: training (n = 18538) and validation set (n = 7935). The average survival duration was seven months. For the short- and long-term predictions, RSF was the best algorithm followed by CPH and CART.ConclusionFor big data, RSF was found to have the highest accuracy and best performance. Using the accurate statistical model for survival prediction and prognostic factors determination will help the care of cancer patients. However, more developments of the R packages are needed to allow more illustrations of the effect of each covariate on the survival probability.
Published: 2021

41. Internally Generated Time in the Rodent Hippocampus is Logarithmically Compressed

Author: Michael E. Hasselmo, Stephen Charczynski, Rui Cao, Marc W. Howard, and John H. Bladon
Subjects: Logarithmic scale, education.field_of_study, General Immunology and Microbiology, Scale (ratio), Logarithm, Field (physics), General Neuroscience, Population, Differential Threshold, Rodentia, Bayes Theorem, General Medicine, Stimulus (physiology), Bayesian inference, Hippocampus, General Biochemistry, Genetics and Molecular Biology, Receptive field, Animals, education, Algorithm, Mathematics
Abstract: The Weber-Fechner law proposes that our perceived sensory input increases with physical input on a logarithmic scale. Hippocampal "time cells" carry a record of recent experience by firing sequentially during a circumscribed period of time after a triggering stimulus. Different cells have "time fields" at different delays up to at least tens of seconds. Past studies suggest that time cells represent a compressed timeline by demonstrating that fewer time cells fire late in the delay and their time fields are wider. This paper asks whether the compression of time cells obeys the Weber-Fechner Law. Time cells were studied with a hierarchical Bayesian model that simultaneously accounts for the firing pattern at the trial level, cell level, and population level. This procedure allows separate estimates of the within-trial receptive field width and the across-trial variability. The analysis at the trial level suggests the time cells represent an internally coherent timeline as a group. Furthermore, even after isolating across-trial variability, time field width increases linearly with delay. Finally, we find that the time cell population is distributed evenly on a logarithmic time scale. Together, these findings provide strong quantitative evidence that the internal neural temporal representation is logarithmically compressed and obeys a neural instantiation of the Weber- Fechner Law.
Published: 2021

42. G-IRAE: a Generalised approach for linking the total Impact of invasion to species’ Range, Abundance and per-unit Effects

Author: Guillaume Latombe, Jane A. Catford, Franz Essl, Bernd Lenzner, David M. Richardson, John R. U. Wilson, and Melodie A. McGeoch
Subjects: abundance, Ecology, vegetation management, Range (biology), biological invasions, Biome, Species distribution, Regression analysis, Multiple species, Unit (housing), Term (time), invasive alien plant species, South Africa, Abundance (ecology), Statistics, impact, Ecology, Evolution, Behavior and Systematics, occupancy, Mathematics
Abstract: The total impact of an alien species was conceptualised as the product of its range size, local abundance and per-unit effect in a seminal paper by Parker et al. (Biol Invasions 1:3-19, 1999). However, a practical approach for estimating the three components has been lacking. Here, we generalise the impact formula and, through use of regression models, estimate the relationship between the three components of impact, an approach we term GIRAE (Generalised Impact = Range size × Abundance × per-unit Effect). We discuss how GIRAE can be applied to multiple types of impact, including environmental impacts, damage and management costs. We propose two methods for applying GIRAE. The species-specific method computes the relationship between impact, range size, abundance and per-unit effect for a given species across multiple invaded sites or regions of different sizes. The multi-species method combines data from multiple species across multiple sites or regions to calculate a per-unit effect for each species and is computed using a single regression model. The species-specific method is more accurate, but it requires a large amount of data for each species and assumes a constant per-unit effect for a species across the invaded area. The multi-species method is more easily applicable and data-parsimonious, but assumes the same relationship between impact, range size and abundance for all considered species. We illustrate these methods using data about money spent managing plant invasions in different biomes of South Africa. We found clear differences between species in terms of money spent per unit area invaded, with per-unit expenditure varying substantially between biomes for some species-insights that are useful for monitoring and evaluating management. GIRAE offers a versatile and practical method that can be applied to many different types of data to better understand and manage the impacts of biological invasions.The online version contains supplementary material available at 10.1007/s10530-022-02836-0.
Published: 2021

43. nQMaker: estimating time non-reversible amino acid substitution models

Author: Robert Lanfear, Jennifer E. James, Hanon McShea, Bui Quang Minh, Joanna Masel, Cuong Cao Dang, and Le Sy Vinh
Subjects: Set (abstract data type), Range (mathematics), Protein sequencing, Phylogenetic tree, Outgroup, Feature (machine learning), Inference, Scale (descriptive set theory), Computational biology, Mathematics
Abstract: Amino acid substitution models are a key component in phylogenetic analyses of protein sequences. All amino acid models available to date are time-reversible, an assumption designed for computational convenience but not for biological reality. Another significant downside to time-reversible models is that they do not allow inference of rooted trees without outgroups. In this paper, we introduce a maximum likelihood approach nQMaker, an extension of the recently published QMaker method, that allows the estimation of time non-reversible amino acid substitution models and rooted phylogenetic trees from a set of protein sequence alignments. We show that the non-reversible models estimated with nQMaker are a much better fit to empirical alignments than pre-existing reversible models, across a wide range of datasets including mammals, birds, plants, fungi, and other taxa, and that the improvements in model fit scale with the size of the dataset. Notably, for the recently published plant and bird trees, these non-reversible models correctly recovered the commonly known root placements with very high statistical support without the need to use an outgroup. We provide nQMaker as an easy-to-use feature in the IQ-TREE software (http://www.iqtree.org), allowing users to estimate non-reversible models and rooted phylogenies from their own protein datasets.
Published: 2021

44. Improving reproducibility of proton MRS brain thermometry: theoretical and empirical approaches

Author: J. John Mann, Zhengchao Dong, and Joshua T. Kantrowitz
Subjects: Reproducibility, Noise, Proton, Monte Carlo method, Line (geometry), Algorithm, Standard deviation, Imaging phantom, Weighting, Mathematics
Abstract: PurposeIn 1H MRS-based thermometry of brain, averaging temperatures measured from more than one reference peak offers several advantages including improving the reproducibility, i.e., precision, of the measurement. This paper proposes theoretically and empirically optimal weighting factors to improve the weighted average of temperatures measured from three references.MethodsWe first proposed concepts of equivalent noise and equivalent signal-to-noise ratio in terms of frequency measurement and a concept of relative frequency that allows the combination of different peaks in a spectrum for improving the precision of frequency measurement. Based on these, we then derived a theoretically optimal weighting factor and proposed an empirical weighting factor, both involving equivalent noise levels, for a weighted average of temperatures measured from three references, i.e., the singlets of NAA, Cr, and Ch, in 1H MR spectrum. We assessed these two weighting factors by comparing their errors in measurement of temperatures with the errors of temperatures measured from individual references; we also compared these two new weighting factors with two previously proposed weighting factors. These errors were defined as the standard deviations (SDs) in repeated measurements or in Monte Carlo studies.ResultsBoth the proposed theoretical and empirical weighting factors outperformed the two previously proposed weighting factors as well as the three individual references in all phantom and in vivo experiments. In phantom experiments with 4 Hz or 10 Hz line broadening, the theoretical weighting factor outperformed the empirical one, but the latter was superior in all other repeated and Monte Carlo tests performed on phantom and in vivo data.ConclusionThe proposed weighting factors are superior to the two previously proposed weighting factors and can improve the reproducibility of temperature measurement using the 1H MRS-based thermometry.
Published: 2021

45. A class of identifiable phylogenetic birth-death models

Author: Jonathan Terhorst and Brandon Legried
Subjects: education.field_of_study, Class (set theory), Phylogenetic tree, Population Dynamics, Population, Dimension (graph theory), Parturition, Biological Evolution, Models, Biological, Markov Chains, Birth–death process, Death, Econometrics, Piecewise, Humans, Tree (set theory), education, Constant (mathematics), Pandemics, Phylogeny, Mathematics
Abstract: In a striking result, Louca and Pennell (2020) recently proved that a large class of phylogenetic birth-death models are statistically unidentifiable from lineage-through-time (LTT) data: any pair of sufficiently smooth birth and death rate functions is “congruent” to an infinite collection of other rate functions, all of which have the same likelihood for any LTT vector of any dimension. As Louca and Pennell argue, this fact has distressing implications for the thousands of studies that have utilized birth-death models to study evolution.In this paper, we qualify their finding by proving that an alternative and widely used class of birth-death models is indeed identifiable. Specifically, we show that piecewise constant birth-death models can, in principle, be consistently estimated and distinguished from one another, given a sufficiently large extant time tree and some knowledge of the present-day population. Subject to mild regularity conditions, we further show that any unidentifiable birth-death model class can be arbitrarily closely approximated by a class of identifiable models. The sampling requirements needed for our results to hold are explicit, and are expected to be satisfied in many contexts such as the phylodynamic analysis of a global pandemic.
Published: 2021

46. Covid spirals: a phase diagram representation of COVID-19 effective reproduction number Rt

Author: Kenneth W Pesenti and Raffaele Pesenti
Subjects: Discrete mathematics, 2019-20 coronavirus outbreak, Coronavirus disease 2019 (COVID-19), Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), media_common.quotation_subject, Representation (systemics), Function (mathematics), Reproduction, Spiral, Phase diagram, media_common, Mathematics
Abstract: In this paper, we propose a phase diagram representation of COVID-19 effective reproduction number Rt. Specifically, we express Rt as a function of the estimated infected individuals. This function plots a particular clockwise spiral that allows to easily compare the evolution of the number of new infected individuals at different dates and, possibly, provide some hints on the future progression of the infection.
Published: 2021

47. Framework estimation of stochastic gene activation using transcription average level

Author: Feng Jiao, Liang Chen, and Genghong Lin
Subjects: Regulation of gene expression, System parameter, Current (mathematics), Oscillation (cell signaling), Average level, Transcription (software), Bijection, injection and surjection, Biological system, Gene, Mathematics
Abstract: Gene activation is usually a non-Markovian process that has been modeled as various frameworks that consist of multiple rate-limiting steps. Understanding the exact activation framework for a gene of interest is a central problem for single-cell studies. In this paper, we focus on the dynamical data of the average transcription level M (t), which is typically neglected when deciphering gene activation. Firstly, the smooth trend lines of M (t) data present rich, visually dynamic features. Secondly, tractable analysis of M (t) allows the establishment of bijections between M (t) dynamics and system parameter regions. Because of these two clear advantages, we can rule out frameworks that fail to capture M (t) features and we can further test potential competent frameworks by fitting M (t) data. We implemented this procedure to determine an exact activation framework for a large number of mouse fibroblast genes under tumor necrosis factor induction; the cross-talk between the signaling and basal pathways is crucial to trigger the first peak of M (t), while the following damped gentle M (t) oscillation is regulated by the multi-step basal pathway. Moreover, the fitted parameters for the mouse genes tested revealed two distinct regulation scenarios for transcription dynamics. Taken together, we were able to develop an efficient procedure for using traditional M (t) data to estimate the gene activation frameworks and system parameters. This procedure, together with sophisticated single-cell transcription data, may facilitate a more accurate understanding of stochastic gene activation.Author SummaryIt has been suggested that genes randomly transit between inactive and active states, with mRNA produced only when a gene is active. The gene activation process has been modeled as a framework of multiple rate-limiting steps listed sequentially, parallel, or in combination. The system step numbers and parameters can be predicted by computationally fitting sophisticated single-cell transcription data. However, current algorithms require a prior hypothetical framework of gene activation. We found that the prior estimation of the framework can be achieved using the traditional dynamical data of mRNA average level M (t) which present easily discriminated dynamical features. The theory regarding M (t) profiles allows us to confidently rule out other frameworks and to determine optimal frameworks by fitting M (t) data. We successfully applied this procedure to a large number of mouse fibroblast genes and confirmed that M (t) is capable of providing a reliable estimation of gene activation frameworks and system parameters.
Published: 2021

48. Characterizing stochastic cell cycle dynamics in exponential growth

Author: Teresa Lo, Dean Huang, Houra Merrikh, and Paul A. Wiggins
Subjects: education.field_of_study, Exponential growth, Stochastic modelling, Simple (abstract algebra), Population, Probability distribution, Context (language use), State (functional analysis), Statistical physics, education, Exponential function, Mathematics
Abstract: Two powerful and complementary experimental approaches are commonly used to study the cell cycle and cell biology: One class of experiments characterizes the statistics (or demographics) of an unsynchronized exponentially-growing population, while the other captures cell cycle dynamics, either by time-lapse imaging of full cell cycles or in bulk experiments on synchronized populations. In this paper, we study the subtle relationship between observations in these two distinct experimental approaches. We begin with an existing model: a single-cell deterministic description of cell cycle dynamics where cell states (i.e. periods or phases) have precise lifetimes. We then generalize this description to a stochastic model in which the states have stochastic lifetimes, as described by arbitrary probability distribution functions. Our analyses of the demographics of an exponential culture reveal a simple and exact correspondence between the deterministic and stochastic models: The corresponding state lifetimes in the deterministic model are equal to the exponential mean of the lifetimes in the stochastic model. An important implication is therefore that the demographics of an exponential culture will be well-fit by a deterministic model even if the state timing is stochastic. Although we explore the implications of the models in the context of the Escherichia coli cell cycle, we expect both the models as well as the significance of the exponential-mean lifetimes to find many applications in the quantitative analysis of cell cycle dynamics in other biological systems.
Published: 2021

49. SeqDistK: a Novel Tool for Alignment-free Phylogenetic Analysis

Author: Xiong Q, Guowei Huang, Wei Chen, Xia Lc, Wencheng Li, Xiaoping Liu, and Tian-Lai Huang
Subjects: Sequence, Tree (data structure), Ground truth, Multiple sequence alignment, Computational complexity theory, Phylogenetic tree, business.industry, Pattern recognition, Artificial intelligence, Symmetric difference, Cluster analysis, business, Mathematics
Abstract: Algorithms for constructing phylogenetic trees are fundamental to study the evolution of viruses, bacteria, and other microbes. Established multiple alignment-based algorithms are inefficient for large scale metagenomic sequence data because of their high requirement of inter-sequence correlation and high computational complexity. In this paper, we present SeqDistK, a novel tool for alignment-free phylogenetic analysis. SeqDistK computes the dissimilarity matrix for phylogenetic analysis, incorporating seven k-mer based dissimilarity measures, namely d2, d2S, d2star, Euclidean, Manhattan, CVTree, and Chebyshev. Based on these dissimilarities, SeqDistK constructs phylogenetic tree using the Unweighted Pair Group Method with Arithmetic Mean algorithm. Using a golden standard dataset of 16S rRNA and its associated phylogenetic tree, we compared SeqDistK to Muscle – a multi sequence aligner. We found SeqDistK was not only 38 times faster than Muscle in computational efficiency but also more accurate. SeqDistK achieved the smallest symmetric difference between the inferred and ground truth trees with a range between 13 to 18, while that of Muscle was 62. When measures d2, d2star, d2S, Euclidean, and k-mer size k=5 were used, SeqDistK consistently inferred phylogenetic tree almost identical to the ground truth tree. We also performed clustering of 16S rRNA sequences using SeqDistK and found the clustering was highly consistent with known biological taxonomy. Among all the measures, d2S (k=5, M=2) showed the best accuracy as it correctly clustered and classified all sample sequences. In summary, SeqDistK is a novel, fast and accurate alignment-free tool for large-scale phylogenetic analysis. SeqDistK software is freely available athttps://github.com/htczero/SeqDistK.
Published: 2021

50. Monomorphic ESS does not imply the stability of the corresponding polymorphic state in the replicator dynamics in matrix games under time constraints

Author: József Garay and Tamás Varga
Subjects: Replicator equation, State (functional analysis), Statistical physics, Stability (probability), Matrix games, Mathematics
Abstract: One of the main result in the theory of classical evolutionary matrix games (Maynard Smith and Price 1973, Maynard Smith 1982) claims that monomorphic ESS condition implies the stability of the corresponding state of the polymorphic replicator dynamics (Hofbauer et al. 1979, Zeeman 1980). The picture was then refined by Cressman (1990) introducing the strong stability concept which says that if there is a monomorphic ESS then stable polymorphism is established in polymorphic populations. In this paper we demonstrate with examples that this relationship generally does not hold in three or higher dimensions if times related to the interactions vary with the strategies of the participants.
Published: 2021

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

225 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources