1,270 results on '"EXPONENTIAL families (Statistics)"'
Search Results
2. A meta‐analysis of factors influencing the inactivation of Shiga toxin‐producing Escherichia coli O157:H7 in leafy greens.
- Author
-
Owade, Joshua Ombaka, Bergholz, Teresa M., and Mitchell, Jade
- Subjects
ESCHERICHIA coli O157:H7 ,MICROBIAL inactivation ,FACTOR analysis ,RANDOM forest algorithms ,EXPONENTIAL families (Statistics) - Abstract
Recent advancements in modeling suggest that microbial inactivation in leafy greens follows a nonlinear pattern, rather than the simple first‐order kinetics. In this study, we evaluated 17 inactivation models commonly used to describe microbial decline and established the conditions that govern microbial survival on leafy greens. Through a systematic review of 65 articles, we extracted 530 datasets to model the fate of Shiga toxin‐producing Escherichia coli O157:H7 on leafy greens. Various factor analysis methods were employed to evaluate the impact of identified conditions on survival metrics. A two‐parameter model (jm2) provided the best fit to most of both natural and antimicrobial‐induced persistence datasets, whereas the one‐parameter exponential model provided the best fit to less than 20% of the datasets. The jm2 model (adjusted R2 =.89) also outperformed the exponential model (adjusted R2 =.58) in fitting the pooled microbial survival data. In the context of survival metrics, the model averaging approach generated higher values than the exponential model for >4 log reduction times (LRTs), suggesting that the exponential model may be overpredicting inactivation at later time points. The random forest technique revealed that temperature and inoculum size were common factors determining inactivation in both natural and antimicrobial‐induced die‐offs.. The findings show the limitations of relying on the first‐order survival metric of 1 LRT and considering nonlinear inactivation in produce safety decision‐making. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Real natural exponential families and generalized orthogonality.
- Author
-
Fakhfakh, Raouf and Hamza, Marwa
- Subjects
- *
POLYNOMIAL operators , *EXPONENTIAL families (Statistics) , *PROBABILITY measures , *ORTHOGONAL polynomials , *POLYNOMIALS - Abstract
In this article, we use the notion of generalized orthogonality for a sequence of polynomials introduced by Bryc, Fakhfakh, and Mlotkowski (2019) to extend the characterizations of the Feinsilver, Meixner, and Shanbhag based on orthogonal polynomials. These new versions subsume the real natural exponential families (NEFs) having polynomial variance function in the mean of arbitrary degree. We also relate generalized orthogonality to Sheffer systems. We show that the generalized orthogonality of Sheffer systems occurs if and only if the associated classical additive convolution semigroup of probability measures generates NEFs with polynomial variance function. In addition, we use the raising and lowering operators for quasi-monomial polynomials associated with NEFs to give a characterization of NEFs with polynomial variance function of arbitrary degree. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Sequential detection of transient signal by moving likelihood ratio statistic in an exponential family.
- Author
-
Wu, Yanhong
- Subjects
- *
SIGNAL detection , *EXPONENTIAL families (Statistics) , *RANDOM variables , *MOVING average process - Abstract
We first consider the sequential detection of transient signals by generalizing the moving average chart to exponential family and study the false detection probability (FDP) and power of detection (POD) in the steady state. Then windowed adjusted signed (or modified directed) likelihood ratio chart is studied by treating it as approximate normal random variable. In the multi-parameter exponential family, the detection of the transient change of one of the canonical parameters or a function of canonical parameters is considered by using the generalized adjusted signed likelihood ratio chart. Comparisons with window restricted CUSUM and Shiryayev-Roberts (S-R) procedures show that the generalized signed likelihood ratio chart performs quite well. Several important examples including the mean or variance change under normal model and a real-time example are used for illustration. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Variational Bayesian Approximation (VBA): Implementation and Comparison of Different Optimization Algorithms.
- Author
-
Fallah Mortezanejad, Seyedeh Azadeh and Mohammad-Djafari, Ali
- Subjects
- *
OPTIMIZATION algorithms , *PRACTICAL reason , *MARKOV chain Monte Carlo , *ALGORITHMS , *PERFORMANCE theory , *EXPONENTIAL families (Statistics) - Abstract
In any Bayesian computations, the first step is to derive the joint distribution of all the unknown variables given the observed data. Then, we have to do the computations. There are four general methods for performing computations: Joint MAP optimization; Posterior expectation computations that require integration methods; Sampling-based methods, such as MCMC, slice sampling, nested sampling, etc., for generating samples and numerically computing expectations; and finally, Variational Bayesian Approximation (VBA). In this last method, which is the focus of this paper, the objective is to search for an approximation for the joint posterior with a simpler one that allows for analytical computations. The main tool in VBA is to use the Kullback–Leibler Divergence (KLD) as a criterion to obtain that approximation. Even if, theoretically, this can be conducted formally, for practical reasons, we consider the case where the joint distribution is in the exponential family, and so is its approximation. In this case, the KLD becomes a function of the usual parameters or the natural parameters of the exponential family, where the problem becomes parametric optimization. Thus, we compare four optimization algorithms: general alternate functional optimization; parametric gradient-based with the normal and natural parameters; and the natural gradient algorithm. We then study their relative performances on three examples to demonstrate the implementation of each algorithm and their efficiency performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Spatial effects analysis of natural forest canopy cover based on spaceborne LiDAR and geostatistics.
- Author
-
Jinge Yu, Li Xu, Qingtai Shu, Shaolong Luo, and Lei Xi
- Subjects
FOREST canopies ,MACHINE learning ,LIDAR ,GEOLOGICAL statistics ,LASER altimeters ,RANDOM forest algorithms ,EXPONENTIAL families (Statistics) - Abstract
Because of the high cost of manual surveys, the analysis of spatial change of forest structure at the regional scale faces a difficult challenge. Spaceborne LiDAR can provide global scale sampling and observation. Taking this opportunity, dense natural forest canopy cover (NFCC) observations obtained by combining spaceborne LiDAR data, plot survey, and machine learning algorithm were used as spatial attributes to analyze the spatial effects of NFCC. Specifically, based on ATL08 (Land and Vegetation Height) product generated from Ice, Cloud and land Elevation Satellite-2/Advanced Topographic Laser Altimeter System (ICESat-2/ATLAS) data and 80 measured plots, the NFCC values located at the LiDAR's footprint locations were predicted by the ML model. Based on the predicted NFCC, the spatial effects of NFCC were analyzed by Moran's I and semi-variogram. The results showed that (1) the Random Forest (RF) model had the strongest predicted performance among the built ML models (R2=0.75, RMSE=0.09); (2) the NFCC had a positive spatial correlation (Moran's I = 0.36), that is, the CC of adjacent natural forest footprints had similar trends or values, belonged to the spatial agglomeration distribution; the spatial variation was described by the exponential model (C0 = 0.12×10-2, C= 0.77×10-2, A0 = 10200 m); (3) topographic factors had significant effects on NFCC, among which elevation was the largest, slope was the second, and aspect was the least; (4) the NFCC spatial distribution obtained by SGCS was in great agreement with the footprint NFCC (R2 = 0.59). The predictions generated from the RF model constructed using ATL08 data offer a dependable data source for the spatial effects analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Interval estimation of the overlapping coefficients in an exponential family of distributions based on upper record values.
- Author
-
Dhaker, Hamza and El Adlouni, Salah-Eddine
- Subjects
- *
SAMPLING (Process) , *CONFIDENCE intervals , *COEFFICIENTS (Statistics) , *STATISTICAL sampling , *EXPONENTIAL families (Statistics) - Abstract
This paper investigates interval estimation for measures of overlap, namely Matusita's measure, Weitzman's measure and based on Kullback–Leibler. Two types of sampling procedures, namely, Simple Random Sample and Upper Record Values from two exponential populations. Bootstrap method series approximation is used to construct confidence intervals for the overlap measures. To illustrate the performance of the likelihood confidence intervals obtained for this overlapping coefficient, under our proposal, we carry out some simulation studies that yield adequate coverage frequencies, this study conducts for comparing the performances of various competing estimators. A real data set is analysed to exemplify our proposal. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Prior effective sample size for exponential family distributions with multiple parameters.
- Author
-
Tamanoi, Ryota
- Subjects
- *
FAMILY size , *SAMPLE size (Statistics) , *EXPONENTIAL families (Statistics) , *BAYESIAN analysis , *REGRESSION analysis , *LOGISTIC regression analysis - Abstract
The setting of priors is an important issue in Bayesian analysis. In particular, when external information is applied, a prior with too much information can dominate the posterior inferences. To prevent this effect, the effective sample size (ESS) can be used. Various ESSs have been proposed recently; however, all have the problem of limiting the applicable prior distributions. For example, one ESS can only be used with a prior that can be approximated by a normal distribution, and another ESS cannot be applied when the parameters are multidimensional. We propose an ESS to be applied to more prior distributions when the sampling model belongs to an exponential family (including the normal model and logistic regression models). This ESS has the predictive consistency and can be used with multidimensional parameters. It is confirmed from normally distributed data with the Student's‐t priors that this ESS behaves as well as an existing predictively consistent ESS for one‐parameter exponential families. As examples of multivariate parameters, ESSs for linear and logistic regression models are also discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Identifiability constraints in generalized additive models.
- Author
-
Stringer, Alex
- Subjects
- *
EXPONENTIAL families (Statistics) , *PARAMETER estimation , *NONLINEAR regression , *PARAMETERIZATION - Abstract
Identifiability constraints are necessary for parameter estimation when fitting models with nonlinear covariate associations. The choice of constraint affects standard errors of the estimated curve. Centring constraints are often applied by default because they are thought to yield lowest standard errors out of any constraint, but this claim has not been investigated. We show that whether centring constraints are optimal depends on the response distribution and parameterization, and that for natural exponential family responses under the canonical parametrization, centring constraints are optimal only for Gaussian response. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. On UMPS hypothesis testing.
- Author
-
Paindaveine, Davy
- Subjects
- *
SYMMETRIC functions , *HYPOTHESIS , *EXPONENTIAL families (Statistics) - Abstract
For two-sided hypothesis testing in location families, the classical optimality criterion is the one leading to uniformly most powerful unbiased (UMPU) tests. Such optimal tests, however, are constructed in exponential models only. We argue that if the base distribution is symmetric, then it is natural to consider uniformly most powerful symmetric (UMPS) tests, that is, tests that are uniformly most powerful in the class of level- α tests whose power function is symmetric. For single-observation models, we provide a condition ensuring existence of UMPS tests and give their explicit form. When this condition is not met, UMPS tests may fail to exist and we provide a weaker condition under which there exist UMP tests in the class of level- α tests whose power function is symmetric and U-shaped. In the multi-observation case, we obtain results in exponential models that also allow for non-location families. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. A new hybrid odd exponential-Φ family: Properties and applications.
- Author
-
Mahdi, Ghanam A., Khaleel, Mundher A., Gemeay, Ahmed M., Nagy, M., Mansi, A. H., Hossain, Md. Moyazzem, and Hussam, Eslam
- Subjects
- *
EXPONENTIAL families (Statistics) , *MAXIMUM likelihood statistics , *CONTINUOUS distributions , *PARAMETER estimation , *FAMILIES - Abstract
The paper introduces the hybrid odd exponential-Φ (HOE-Φ) family, a novel framework for generating a continuous distribution characterized by an additional parameter. The extensive statistical properties of this family are derived and explored in detail. Parameter estimation is performed using the maximum likelihood estimation technique. The efficacy and versatility of the proposed model are demonstrated through a comparative analysis involving two distinct real-world datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. eRPCA: Robust Principal Component Analysis for Exponential Family Distributions.
- Author
-
Zheng, Xiaojun, Mak, Simon, Xie, Liyan, and Xie, Yao
- Subjects
- *
PRINCIPAL components analysis , *DISTRIBUTION (Probability theory) , *EXPONENTIAL families (Statistics) , *LOW-rank matrices , *OPTIMIZATION algorithms , *SPARSE matrices - Abstract
Robust principal component analysis (RPCA) is a widely used method for recovering low‐rank structure from data matrices corrupted by significant and sparse outliers. These corruptions may arise from occlusions, malicious tampering, or other causes for anomalies, and the joint identification of such corruptions with low‐rank background is critical for process monitoring and diagnosis. However, existing RPCA methods and their extensions largely do not account for the underlying probabilistic distribution for the data matrices, which in many applications are known and can be highly non‐Gaussian. We thus propose a new method called RPCA for exponential family distributions (eRPCA$$ {e}^{\mathrm{RPCA}} $$), which can perform the desired decomposition into low‐rank and sparse matrices when such a distribution falls within the exponential family. We present a novel alternating direction method of multiplier optimization algorithm for efficient eRPCA$$ {e}^{\mathrm{RPCA}} $$ decomposition, under either its natural or canonical parametrization. The effectiveness of eRPCA$$ {e}^{\mathrm{RPCA}} $$ is then demonstrated in two applications: the first for steel sheet defect detection and the second for crime activity monitoring in the Atlanta metropolitan area. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Parameter estimation and application of inverse Gaussian regression.
- Author
-
Nisa, Eva Khoirun and Miasary, Seftina Diyah
- Subjects
- *
INVERSE Gaussian distribution , *GAUSSIAN distribution , *GAUSSIAN processes , *MAXIMUM likelihood statistics , *EXPONENTIAL families (Statistics) , *PARAMETER estimation , *ENVIRONMENTAL sciences , *SMALL business - Abstract
Some cases in environmental studies show that the response variable follows an exponential family, for instance, an inverse Gaussian distribution. Suppose that the positive response variable depends on a set of predictors. We can model it using an inverse Gaussian regression. This study uses the parameter estimation of the inverse Gaussian regression using Maximum Likelihood Estimation (MLE). Since the estimate produced by MLE is not linear, convenient numerical methods are required. The numerical methods used are Fisher scoring and Broyden-Fletcher-Goldfarb-Shanno (BFGS). In this study, we applied inverse Gaussian regression to culinary Micro, Small, and Medium Enterprise resilience data (MSMEs) during extraordinary events. This is an alternative, considering the inverse Gaussian has never been applied outside the survival field. This study aims to obtain an inverse Gaussian regression parameter estimator and factors that influence the resilience of culinary MSMEs from the inverse Gaussian regression application. Fisher scoring and BFGS produce almost the same parameter estimation values. Inverse Gaussian regression's application produces cost and service time that affect the resilience of culinary MSMEs during extraordinary events. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Estimating multiplicity of infection, allele frequencies, and prevalences accounting for incomplete data.
- Author
-
Hashemi, Meraj and Schneider, Kristan A.
- Subjects
- *
MISSING data (Statistics) , *GENE frequency , *EXPONENTIAL families (Statistics) , *FISHER information , *COMMUNICABLE diseases , *DISTRIBUTION (Probability theory) - Abstract
Background: Molecular surveillance of infectious diseases allows the monitoring of pathogens beyond the granularity of traditional epidemiological approaches and is well-established for some of the most relevant infectious diseases such as malaria. The presence of genetically distinct pathogenic variants within an infection, referred to as multiplicity of infection (MOI) or complexity of infection (COI) is common in malaria and similar infectious diseases. It is an important metric that scales with transmission intensities, potentially affects the clinical pathogenesis, and a confounding factor when monitoring the frequency and prevalence of pathogenic variants. Several statistical methods exist to estimate MOI and the frequency distribution of pathogen variants. However, a common problem is the quality of the underlying molecular data. If molecular assays fail not randomly, it is likely to underestimate MOI and the prevalence of pathogen variants. Methods and findings: A statistical model is introduced, which explicitly addresses data quality, by assuming a probability by which a pathogen variant remains undetected in a molecular assay. This is different from the assumption of missing at random, for which a molecular assay either performs perfectly or fails completely. The method is applicable to a single molecular marker and allows to estimate allele-frequency spectra, the distribution of MOI, and the probability of variants to remain undetected (incomplete information). Based on the statistical model, expressions for the prevalence of pathogen variants are derived and differences between frequency and prevalence are discussed. The usual desirable asymptotic properties of the maximum-likelihood estimator (MLE) are established by rewriting the model into an exponential family. The MLE has promising finite sample properties in terms of bias and variance. The covariance matrix of the estimator is close to the Cramér-Rao lower bound (inverse Fisher information). Importantly, the estimator's variance is larger than that of a similar method which disregards incomplete information, but its bias is smaller. Conclusions: Although the model introduced here has convenient properties, in terms of the mean squared error it does not outperform a simple standard method that neglects missing information. Thus, the new method is recommendable only for data sets in which the molecular assays produced poor-quality results. This will be particularly true if the model is extended to accommodate information from multiple molecular markers at the same time, and incomplete information at one or more markers leads to a strong depletion of sample size. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. eSVD-DE: cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings.
- Author
-
Lin, Kevin Z., Qiu, Yixuan, and Roeder, Kathryn
- Subjects
- *
EXPONENTIAL families (Statistics) , *BIOLOGICAL systems , *MATRIX decomposition , *RNA sequencing , *FUZZY neural networks , *GENES - Abstract
Background: Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. Results: We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. Conclusions: eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. An experimental comparison of classic statistical techniques on univariate time series forecasting.
- Author
-
Khan, Darakhshan Rizwan, Patankar, Archana B., and Khan, Aayisha
- Subjects
AIR quality indexes ,FORECASTING ,STATISTICAL smoothing ,EXPONENTIAL families (Statistics) ,MOVING average process ,TIME series analysis ,AIR pollutants - Abstract
In today's world, there is a high demand for understanding and analyzing patterns in time series data in order to make accurate forecasts and predictions. Multiple univariate time series demonstrating the concentration of various air pollutants such as PM10, PM2.5, NH 3 , NOx, Ozone, SO 2 , CO and an air quality index (AQI) were used for this study. The research examines and categorizes many classic statistical forecasting techniques into three groups: smoothing techniques, linear regressive techniques for stationary data, and linear regressive techniques for nonstationary data. Nine different approaches were used in order to comprehend the logic behind and effectiveness of statistical techniques, including the simple, double, and triple exponential smoothing methods, the autoregressive method, the moving average method, the ARMA method, and the ARIMA families of methods, which include the ARIMA, SARIMA, and SARIMAX methods. The RMSE metric is used for evaluation, and it shows that the SARIMAX method outperformed other techniques for PM2.5, PM10, NH3, Ozone, and AQI with RMSE values of 8.56, 6.72, 5.52, 3.51, and 20.88, respectively, whereas for NOx and SO2, SARIMA produced better results with RMSE values of 9.8 and 2.48. Only for CO time series, triple exponential smoothing outperformed with an RMSE of 0.36. To demonstrate the robustness of statistical techniques, top performing methods are compared with support vector regression. Furthermore, some potential future directions are discussed based on the results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Robust Inference and Modeling of Mean and Dispersion for Generalized Linear Models.
- Author
-
Ponnet, Jolien, Segaert, Pieter, Van Aelst, Stefan, and Verdonck, Tim
- Subjects
- *
DISTRIBUTION (Probability theory) , *INFERENCE (Logic) , *DISPERSION (Chemistry) , *EXPONENTIAL families (Statistics) , *LIKELIHOOD ratio tests - Abstract
Generalized Linear Models (GLMs) are a popular class of regression models when the responses follow a distribution in the exponential family. In real data the variability often deviates from the relation imposed by the exponential family distribution, which results in over- or underdispersion. Dispersion effects may even vary in the data. Such datasets do not follow the traditional GLM distributional assumptions, leading to unreliable inference. Therefore, the family of double exponential distributions has been proposed, which models both the mean and the dispersion as a function of covariates in the GLM framework. Since standard maximum likelihood inference is highly susceptible to the possible presence of outliers, we propose the robust double exponential (RDE) estimator. Asymptotic properties and robustness of the RDE estimator are discussed. A generalized robust quasi-deviance measure is introduced which constitutes the basis for a stable robust test. Simulations for binomial and Poisson models show the excellent performance of the RDE estimator and corresponding robust tests. Penalized versions of the RDE estimator are developed for sparse estimation with high-dimensional data and for flexible estimation via generalized additive models (GAMs). Real data applications illustrate the relevance of robust inference for dispersion effects in GLMs and GAMs. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Divergences Induced by the Cumulant and Partition Functions of Exponential Families and Their Deformations Induced by Comparative Convexity.
- Author
-
Nielsen, Frank
- Subjects
- *
EXPONENTIAL functions , *EXPONENTIAL families (Statistics) , *INFORMATION theory , *SMOOTHNESS of functions , *ENERGY function , *MACHINE learning - Abstract
Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning, among others. An exponential family can either be normalized subtractively by its cumulant or free energy function, or equivalently normalized divisively by its partition function. Both the cumulant and partition functions are strictly convex and smooth functions inducing corresponding pairs of Bregman and Jensen divergences. It is well known that skewed Bhattacharyya distances between the probability densities of an exponential family amount to skewed Jensen divergences induced by the cumulant function between their corresponding natural parameters, and that in limit cases the sided Kullback–Leibler divergences amount to reverse-sided Bregman divergences. In this work, we first show that the α -divergences between non-normalized densities of an exponential family amount to scaled α -skewed Jensen divergences induced by the partition function. We then show how comparative convexity with respect to a pair of quasi-arithmetical means allows both convex functions and their arguments to be deformed, thereby defining dually flat spaces with corresponding divergences when ordinary convexity is preserved. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. ON VARIATIONAL INFERENCE AND MAXIMUM LIKELIHOOD ESTIMATION WITH THE λ-EXPONENTIAL FAMILY.
- Author
-
GUILMEAU, THOMAS, CHOUZENOUX, EMILIE, and ELVIRA, VİCTOR
- Subjects
EXPONENTIAL families (Statistics) ,ALGORITHMS ,ESTIMATION theory ,MATHEMATICS ,METHODOLOGY - Abstract
The λ-exponential family has recently been proposed to generalize the exponential family. While the exponential family is well-understood and widely used, this is not the case yet for the λ-exponential family. However, many applications require models that are more general than the exponential family, and the λ-exponential family is often a good alternative. In this work, we propose a theoretical and algorithmic framework to solve variational inference and maximum likelihood estimation problems over the λ-exponential family. We give new sufficient optimality conditions for variational inference problems. Our conditions take the form of generalized moment-matching conditions and generalize existing similar results for the exponential family. We exhibit novel characterizations of the solutions of maximum likelihood estimation problems, that recover optimality conditions in the case of the exponential family. For the resolution of both problems, we propose novel proximal-like algorithms that exploit the geometry underlying the λ-exponential family. These new theoretical and methodological insights are tested on numerical examples, showcasing their usefulness and interest, especially on heavy-tailed target distributions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Discussion of "Identifiability of latent-variable and structural-equation models: from linear to nonlinear".
- Author
-
Matsuda, Takeru
- Subjects
- *
STRUCTURAL equation modeling , *INDEPENDENT component analysis , *EXPONENTIAL families (Statistics) , *SIGNAL processing , *MACHINE learning , *NONLINEAR theories - Abstract
This document discusses the identifiability theory for linear/nonlinear independent component analysis (ICA) models and structural equation models. It acknowledges Professor Hyvärinen for his contributions to machine learning and signal processing. The paper provides a comprehensive survey of the theory and application of nonlinear ICA and causal discovery, written in an accessible form for statisticians. It explores the identifiability of the nonlinear ICA model and discusses methods for solving it, such as exploiting temporal structure and learning via classification using neural networks. The text also mentions other statistical methods that utilize the idea of "learning via classification," such as bridge sampling and noise contrastive estimation. The author suggests that the exponential family interpretation can provide a theoretical basis for developing statistical methods that use pre-trained neural networks as feature extractors. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
21. Multilayer Exponential Family Factor models for integrative analysis and learning disease progression.
- Author
-
Wang, Qinxia and Wang, Yuanjia
- Subjects
- *
DISEASE progression , *PARKINSON'S disease , *NEUROLOGICAL disorders , *BIOMARKERS , *EXPONENTIAL families (Statistics) , *LATENT variables - Abstract
Current diagnosis of neurological disorders often relies on late-stage clinical symptoms, which poses barriers to developing effective interventions at the premanifest stage. Recent research suggests that biomarkers and subtle changes in clinical markers may occur in a time-ordered fashion and can be used as indicators of early disease. In this article, we tackle the challenges to leverage multidomain markers to learn early disease progression of neurological disorders. We propose to integrate heterogeneous types of measures from multiple domains (e.g. discrete clinical symptoms, ordinal cognitive markers, continuous neuroimaging, and blood biomarkers) using a hierarchical Multilayer Exponential Family Factor (MEFF) model, where the observations follow exponential family distributions with lower-dimensional latent factors. The latent factors are decomposed into shared factors across multiple domains and domain-specific factors, where the shared factors provide robust information to perform extensive phenotyping and partition patients into clinically meaningful and biologically homogeneous subgroups. Domain-specific factors capture remaining unique variations for each domain. The MEFF model also captures nonlinear trajectory of disease progression and orders critical events of neurodegeneration measured by each marker. To overcome computational challenges, we fit our model by approximate inference techniques for large-scale data. We apply the developed method to Parkinson's Progression Markers Initiative data to integrate biological, clinical, and cognitive markers arising from heterogeneous distributions. The model learns lower-dimensional representations of Parkinson's disease (PD) and the temporal ordering of the neurodegeneration of PD. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Semi‐parametric generalized linear model for binomial data with varying cluster sizes.
- Author
-
Qi, Xinran and Szabo, Aniko
- Subjects
- *
NEWTON-Raphson method , *BINOMIAL distribution , *EXPECTATION-maximization algorithms , *DATA modeling , *DATA distribution , *EXPONENTIAL families (Statistics) - Abstract
The semi‐parametric generalized linear model (SPGLM) proposed by Rathouz and Gao assumes that the response is from a general exponential family with unspecified reference distribution and can be applied to model the distribution of binomial event‐count data with a constant cluster size. We extend SPGLM to model response distributions of binomial data with varying cluster sizes by assuming marginal compatibility. The proposed model combines a non‐parametric reference describing the within‐cluster dependence structure with a parametric density ratio characterizing the between‐group effect. It avoids making parametric assumptions about higher order dependence and is more parsimonious than non‐parametric models. We fit the SPGLM with an expectation–maximization Newton–Raphson algorithm to the boron acid mouse data set and compare estimates with existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
23. THE EXTENDED-EXPONENTIAL DISTRIBUTION: PROPERTIES, ESTIMATION METHODS, AND APPLICATIONS.
- Author
-
AL-MOFLEH, HAZEM, HUSSEIN, EKRAMY A., AFIFY, AHMED Z., ALNSSYAN, BADR, and ABDELLATIF, ASHRAF D.
- Subjects
EXPONENTIAL families (Statistics) ,DISTRIBUTION (Probability theory) ,MATHEMATICAL statistics ,ESTIMATION theory ,DATA science - Abstract
This paper is presenting a new flexible probability distribution which named as Khalil new generalized exponential (KNGEx) distribution. We introduce its mathematical properties. The hazard function of the KNGEx distribution can be increasing, decreasing, and inverted bathtub. The parameters of the distribution are estimated using eight classical methods. Simulation studies based on complete sample are done. Finally, two applications to medicine and engineering data sets are presented. The analyzed data revealed that the proposed distribution could potentially be very useful in describing and modeling both data sets as compared to many other competing distributions. [ABSTRACT FROM AUTHOR]
- Published
- 2023
24. Optimal Investment–Consumption–Insurance Problem of a Family with Stochastic Income under the Exponential O-U Model.
- Author
-
Wang, Yang, Lin, Jianwei, Chen, Dandan, and Zhang, Jizhou
- Subjects
- *
INCOME , *CONSUMPTION (Economics) , *DYNAMIC programming , *UTILITY functions , *LIFE insurance , *EXPONENTIAL families (Statistics) , *STOCHASTIC control theory - Abstract
A household consumption and optimal portfolio problem pertinent to life insurance (LI) in a continuous time setting is examined. The family receives a random income before the parents' retirement date. The price of the risky asset is driven by the exponential Ornstein–Uhlenbeck (O-U) process, which can better reflect the state of the financial market. If the parents pass away prior to their retirement time, the children do not have any work income and LI can be purchased to hedge the loss of wealth due to the parents' accidental death. Meanwhile, utility functions (UFs) of the parents and children are individually taken into account in relation to the uncertain lifetime. The purpose of the family is to appropriately maximize the weighted average of the corresponding utilities of the parents and children. The optimal strategies of the problem are achieved using a dynamic programming approach to solve the associated Hamilton–Jacobi–Bellman (HJB) equation by employing the convex dual theory and Legendre transform (LT). Finally, we aim to examine how variations in the weight of the parents' UF and the coefficient of risk aversion affect the optimal policies. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. Optimal equivalence testing in exponential families.
- Author
-
Zhao, Renren and Paige, Robert L.
- Subjects
SADDLEPOINT approximations ,MONTE Carlo method ,EXPONENTIAL families (Statistics) - Abstract
We develop uniformly most powerful unbiased (UMPU) two sample equivalence test for a difference of canonical parameters in exponential families. This development involves a non-unique reparametrization. We address this issue via a novel characterization of all possible reparametrizations of interest in terms of a matrix group. Furthermore, our procedure involves an intractable conditional distribution which we reproduce to a high degree of accuracy using saddlepoint approximations. The development of this saddlepoint-based procedure involves a non-unique reparametrization but we show that our procedure is invariant under choice of reparametrization. Our real data example considers the mean-to-variance ratio for normally distributed data. We compare our result to six competing equivalence testing procedures for the mean-to-variance ratio. Only our UMPU method finds evidence of equivalence, which is the expected result. We also perform a Monte Carlo simulation study which shows that our UMPU method outperforms all competing methods by exhibiting an empirical significance level which is not statistically significantly different from the nominal 5% level for all simulation settings. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
26. Attention in a Family of Boltzmann Machines Emerging From Modern Hopfield Networks.
- Author
-
Ota, Toshihiro and Karakida, Ryo
- Subjects
- *
HOPFIELD networks , *BOLTZMANN machine , *ARTIFICIAL neural networks , *EXPONENTIAL families (Statistics) , *ENERGY function - Abstract
Hopfield networks and Boltzmann machines (BMs) are fundamental energy-based neural network models. Recent studies on modern Hopfield networks have broadened the class of energy functions and led to a unified perspective on general Hopfield networks, including an attention module. In this letter, we consider the BM counterparts of modern Hopfield networks using the associated energy functions and study their salient properties from a trainability perspective. In particular, the energy function corresponding to the attention module naturally introduces a novel BM, which we refer to as the attentional BM (AttnBM). We verify that AttnBM has a tractable likelihood function and gradient for certain special cases and is easy to train. Moreover, we reveal the hidden connections between AttnBM and some single-layer models, namely the gaussian–Bernoulli restricted BM and the denoising autoencoder with softmax units coming from denoising score matching. We also investigate BMs introduced by other energy functions and show that the energy function of dense associative memory models gives BMs belonging to exponential family harmoniums. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
27. Natural Products from Singapore Soil-Derived Streptomycetaceae Family and Evaluation of Their Biological Activities.
- Author
-
Chin, Elaine-Jinfeng, Ching, Kuan-Chieh, Tan, Zann Y., Wibowo, Mario, Leong, Chung-Yan, Yang, Lay-Kien, Ng, Veronica W. P., Seow, Deborah C. S., Kanagasundaram, Yoganathan, and Ng, Siew-Bee
- Subjects
- *
ACTINOBACTERIA , *NATURAL products , *STREPTOMYCETACEAE , *METHICILLIN-resistant staphylococcus aureus , *GRAM-positive bacteria , *EXPONENTIAL families (Statistics) , *STAPHYLOCOCCUS aureus - Abstract
Natural products have long been used as a source of antimicrobial agents against various microorganisms. Actinobacteria are a group of bacteria best known to produce a wide variety of bioactive secondary metabolites, including many antimicrobial agents. In this study, four actinobacterial strains found in Singapore terrestrial soil were investigated as potential sources of new antimicrobial compounds. Large-scale cultivation, chemical, and biological investigation led to the isolation of a previously undescribed tetronomycin A (1) that demonstrated inhibitory activities against both Gram-positive bacteria Staphylococcus aureus (SA) and methicillin-resistant Staphylococcus aureus (MRSA) (i.e., MIC90 of 2–4 μM and MBC90 of 9–12 μM), and several known antimicrobial compounds, namely nonactin, monactin, dinactin, 4E-deacetylchromomycin A3, chromomycin A2, soyasaponin II, lysolipin I, tetronomycin, and naphthomevalin. Tetronomycin showed a two- to six-fold increase in antibacterial activity (i.e., MIC90 and MBC90 of 1–2 μM) as compared to tetronomycin A (1), indicating the presence of an oxy-methyl group at the C-27 position is important for antibacterial activity. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
28. On umbral properties of a family of hyperbolic-like functions appearing in magnetic transport problem.
- Author
-
Dattoli, Giuseppe, Khan, Subuhi, Haneef, Mehnaz, and Licciardi, Silvia
- Subjects
- *
HYPERBOLIC functions , *SPECIAL functions , *EXPONENTIAL families (Statistics) , *HYPERGEOMETRIC functions , *SOLENOIDS , *CALCULUS , *HYPERBOLIC differential equations - Abstract
Umbral operational techniques offer sturdy mechanism in the studies of special functions and special polynomials. The techniques of umbral calculus are employed to derive properties of families of exponential-like functions and their hyperbolic forms. The generalized forms of Mittag-Leffler functions are used to solve technical problems concerning transport of a charged beam in a solenoid magnet. The proposed method is flexible and has many advantages over standard computational techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
29. Predictive ratio CUSUM (PRC): A Bayesian approach in online change point detection of short runs.
- Author
-
Bourazas, Konstantinos, Sobas, Frederic, and Tsiamyrtzis, Panagiotis
- Subjects
CHANGE-point problems ,STATISTICAL process control ,QUALITY control charts ,EXPONENTIAL families (Statistics) ,INFORMATION resources - Abstract
The online quality monitoring of a process with low volume data is a very challenging task and the attention is most often placed in detecting when some of the underline (unknown) process parameter(s) experience a persistent shift. Self-starting methods, both in the frequentist and the Bayesian domain aim to offer a solution. Adopting the latter perspective, we propose a general closed-form Bayesian scheme, where the testing procedure is built on a memory-based control chart that relies on the cumulative ratios of sequentially updated predictive distributions. The theoretic framework can accommodate any likelihood from the regular exponential family and the use of conjugate analysis allows closed form modeling. Power priors will offer the axiomatic framework to incorporate into the model different sources of information, when available. A simulation study evaluates the performance against competitors and examines aspects of prior sensitivity. Technical details and algorithms are provided as . [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
30. Group sequential tests: beyond exponential family models.
- Author
-
Tarima, Sergey and Flournoy, Nancy
- Subjects
LIKELIHOOD ratio tests ,EXPONENTIAL families (Statistics) ,DATA distribution ,SAMPLE size (Statistics) ,MAXIMUM likelihood statistics - Abstract
This manuscript considers group sequential tests powered for multiple ordered alternative hypotheses with a predetermined α -spending function. Theorem 1 shows that if a fixed-sample size likelihood ratio test is monotone with respect to a one-dimensional test statistic, then a group sequential test constructed by interim cumulative likelihood ratio tests is most powerful for a sequence of ordered alternatives, at a given α -spending function. This theorem extends Tarima and Flournoy (Metrika 85: 491-513, 2022) from the exponential family to non-exponential distributions with monotone likelihood ratio. A three-stage design powered for three ordered alternatives shows how the theory applies to uniform data. When the likelihood ratio is not monotone for finite sample sizes, locally most powerful tests can be constructed if a test is locally most powerful for a fixed sample size against a local alternative. A two-stage Cauchy example shows how such tests can be built using either a likelihood ratio test statistic or its MLE. Overall, if a parametric distribution of the data is either known or assumed, MLE-based group sequential tests powered for multiple ordered alternatives are most powerful for this set of hypotheses in either finite or in local asymptotic settings. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
31. Supervised Dimensionality Reduction of Proportional Data Using Exponential Family Distributions.
- Author
-
Masoudimansour, Walid and Bouguila, Nizar
- Subjects
DIMENSIONAL reduction algorithms ,DATA reduction ,TIME complexity ,HEURISTIC ,HEURISTIC algorithms ,EXPONENTIAL families (Statistics) ,DIMENSION reduction (Statistics) - Abstract
Most well-known supervised dimensionality reduction algorithms suffer from the curse of dimensionality while handling high-dimensional sparse data due to ill-conditioned second-order statistics matrices. They also do not deal with multi-modal data properly since they construct neighborhood graphs that do not discriminate between multi-modal classes of data and single-modal ones. In this paper, a novel method that mitigates the above problems is proposed. In this method, assuming the data is from two classes, they are projected into the low-dimensional space in the first step which removes sparsity from the data and reduces the time complexity of any operation drastically afterwards. These projected data are modeled using a mixture of exponential family distributions for each class, allowing the modeling of multi-modal data. A measure for the similarity between the two projected classes is used as an objective function for constructing an optimization problem, which is then solved using a heuristic search algorithm to find the best separating projection. The conducted experiments show that the proposed method outperforms the rest of the compared algorithms and provides a robust effective solution to the problem of dimensionality reduction even in the presence of multi-modal and sparse data. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
32. Bivariate Discrete Odd Generalized Exponential Generator of Distributions for Count Data: Copula Technique, Mathematical Theory, and Applications.
- Author
-
Al-Essa, Laila A., Eliwa, Mohamed S., Shahen, Hend S., Khalil, Amal A., Alqifari, Hana N., and El-Morshedy, Mahmoud
- Subjects
- *
DISTRIBUTION (Probability theory) , *DATA distribution , *MAXIMUM likelihood statistics , *CONDITIONAL expectations , *BIVARIATE analysis , *EXPONENTIAL families (Statistics) , *STATISTICAL correlation - Abstract
In this article, a new family of bivariate discrete distributions is proposed based on the copula concept, in the so-called bivariate discrete odd generalized exponential-G family. Some distributional properties, including the joint probability mass function, joint survival function, joint failure rate function, median correlation coefficient, and conditional expectation, are derived. After proposing the general class, one special model of the new bivariate family is discussed in detail. The maximum likelihood approach is utilized to estimate the family parameters. A detailed simulation study is carried out to examine the bias and mean square error of maximum likelihood estimators. Finally, the importance of the new bivariate family is explained by means of two distinctive real data sets in various fields. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
33. Regression analysis for exponential family data in a finite population setup using two-stage cluster sample.
- Author
-
Sutradhar, Brajendra C.
- Subjects
- *
REGRESSION analysis , *ASYMPTOTIC normality , *EXPONENTIAL families (Statistics) , *LEAST squares , *CLUSTER sampling , *CLUSTER analysis (Statistics) , *CONFIDENCE intervals - Abstract
Over the last four decades, the cluster regression analysis in a finite population (FP) setup for an exponential family such as linear or binary data was done by using a two-stage cluster sample chosen from the FP but by treating the sample as though it is a single-stage cluster sample from a super-population (SP) which contains the FP as a hypothetical sample. Because the responses within a cluster in the FP are correlated, the aforementioned sample mis-specification makes the sample-based so-called GLS (generalized least square) estimators design biased and inconsistent. In this paper, we demonstrate for the exponential family data how to avoid the sampling mis-specification and accommodate the cluster correlations to obtain unbiased and consistent estimates for the FP parameters. The asymptotic normality of the regression estimators is also given for the construction of confidence intervals when needed. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
34. On some composite Kies families: distributional properties and saturation in Hausdorff sense.
- Author
-
Zaevski, Tsvetelin and Kyurkchiev, Nikolay
- Subjects
DISTRIBUTION (Probability theory) ,WEIBULL distribution ,FINANCIAL markets ,FAMILIES ,SENSES ,EXPONENTIAL families (Statistics) - Abstract
The stochastic literature contains several extensions of the exponential distribution which increase its applicability and flexibility. In the present article, some properties of a new power modified exponential family with an original Kies correction are discussed. This family is defined as a Kies distribution which domain is transformed by another Kies distribution. Its probabilistic properties are investigated and some limitations for the saturation in the Hausdorff sense are derived. Moreover, a formula of a semiclosed form is obtained for this saturation. Also the tail behavior of these distributions is examined considering three different criteria inspired by the financial markets, namely, the VaR, AVaR, and expectile based VaR. Some numerical experiments are provided, too. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
35. An inverse Sanov theorem for exponential families.
- Author
-
Macci, Claudio and Piccioni, Mauro
- Subjects
- *
MAXIMUM likelihood statistics , *EXPONENTIAL families (Statistics) , *LARGE deviations (Mathematics) - Abstract
We prove the large deviation principle (LDP) for posterior distributions arising from subfamilies of full exponential families, allowing misspecification of the model. Moreover, motivated by the so-called inverse Sanov Theorem (see e.g. [11,12]), we prove the LDP for the corresponding maximum likelihood estimator, and we study the relationship between rate functions. In our setting, even in the non misspecified case, it is not true in general that the rate functions for posterior distributions and for maximum likelihood estimators are Kullback–Leibler divergences with exchanged arguments. • LDP's for posterior distributions of exponential families with misspecification • Comparison with the rate function for maximum likelihood estimators • Variational problems in the information geometry of KL divergence • LDP's for dual exponential families [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
36. Information-Geometric Approach for a One-Sided Truncated Exponential Family.
- Author
-
Yoshioka, Masaki and Tanaka, Fuyuhiko
- Subjects
- *
EXPONENTIAL families (Statistics) , *MAXIMUM likelihood statistics , *RIEMANNIAN metric , *STATISTICAL models - Abstract
In information geometry, there has been extensive research on the deep connections between differential geometric structures, such as the Fisher metric and the α-connection, and the statistical theory for statistical models satisfying regularity conditions. However, the study of information geometry for non-regular statistical models is insufficient, and a one-sided truncated exponential family (oTEF) is one example of these models. In this paper, based on the asymptotic properties of maximum likelihood estimators, we provide a Riemannian metric for the oTEF. Furthermore, we demonstrate that the oTEF has an α = 1 parallel prior distribution and that the scalar curvature of a certain submodel, including the Pareto family, is a negative constant. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
37. Numerical solution of Kiefer-Weiss problems when sampling from continuous exponential families.
- Author
-
Novikov, Andrey, Novikov, Andrei, and Farkhshatov, Fahil
- Subjects
- *
SEQUENTIAL analysis , *CHARACTERISTIC functions , *EXPONENTIAL families (Statistics) , *GAUSSIAN distribution , *ERROR probability , *TEST design - Abstract
In this article, we deal with problems of testing hypotheses in the framework of sequential statistical analysis. The main concern is the optimal design and performance evaluation of sampling plans in Kiefer-Weiss problems. The main goal of the Kiefer-Weiss problem is designing hypothesis tests that minimize the maximum average sample number, over all parameter values, as opposed to both the sequential probability tests (SPRTs) minimizing the average sample number only at two hypothesis points and the classical fixed-sample-size test. For observations that follow a distribution from an exponential family of the continuous type, we provide algorithms for optimal design in the modified Kiefer-Weiss problem and obtain formulas for evaluating the performance of sequential tests by calculating the operating characteristic function, the average sample number, and some related characteristics. These formulas cover, as a particular case, the SPRTs and their truncated versions, as well as optimal finite-horizon sequential tests. In the setting of the original Kiefer-Weiss problem we apply the method of our recent work (Sequential Analysis 2022, 41(2), 198–219) for numerical construction of the optimal tests. For the particular case of sampling from a normal distribution with a known variance, we make numerical comparisons of the Kiefer-Weiss solution with the SPRT and the fixed-sample-size test provided that the three tests have the same levels of the error probabilities. All of the algorithms are implemented in the form of computer code written in the R programming language and are available at the GitHub public repository (). Guidelines on the adaptation of the program code to other exponential family distributions are provided. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. A modified one stage multiple comparison procedure of exponential location parameters with the control under heteroscedasticity.
- Author
-
Wu, Shu-Fei
- Subjects
- *
MULTIPLE comparisons (Statistics) , *EXPERIMENTAL design , *DISTRIBUTION (Probability theory) , *STOCK exchanges , *CONFIDENCE intervals , *HETEROSCEDASTICITY , *EXPONENTIAL families (Statistics) - Abstract
In this paper, we present a modified one-stage multiple comparison procedure for exponential location parameters with the control under heteroscedasticity including one-sided and two-sided confidence intervals to improve the coverage probability and average confidence length than the old one. These intervals can be used to identify a subset which includes all no-worse-than-the-control treatments in an experimental design and to identify better-than-the-control, worse-than-the- control and not-much-different-from-the-control products in agriculture, stock market, pharmaceutical industries in terms of the minimum guarantee lifetimes. A simulation comparison is done for this modified procedure with the old one in terms of the confidence length and coverage probability. One example is given to demonstrate the proposed modified procedure. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
39. Asymptotic theory in network models with covariates and a growing number of node parameters.
- Author
-
Wang, Qiuping, Zhang, Yuan, and Yan, Ting
- Subjects
- *
ASYMPTOTIC normality , *MODEL theory , *MOMENTS method (Statistics) , *EXPONENTIAL families (Statistics) , *HETEROGENEITY - Abstract
We propose a general model that jointly characterizes degree heterogeneity and homophily in weighted, undirected networks. We present a moment estimation method using node degrees and homophily statistics. We establish consistency and asymptotic normality of our estimator using novel analysis. We apply our general framework to three applications, including both exponential family and non-exponential family models. Comprehensive numerical studies and a data example also demonstrate the usefulness of our method. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
40. Efficient Optimization of Partition Scan Statistics via the Consecutive Partitions Property.
- Author
-
Pehlivanian, Charles A. and Neill, Daniel B.
- Subjects
- *
SCAN statistic , *DYNAMIC programming , *STATISTICS , *EXPONENTIAL families (Statistics) , *COMBINATORIAL optimization - Abstract
Abstract–We generalize the spatial and subset scan statistics from the single to the multiple subset case. The two main approaches to defining the log-likelihood ratio statistic in the single subset case—the population-based and expectation-based scan statistics—are considered, leading to risk partitioning and multiple cluster detection scan statistics, respectively. We show that, for distributions in a separable exponential family, the risk partitioning scan statistic can be expressed as a scaled f-divergence of the normalized count and baseline vectors, and the multiple cluster detection scan statistic as a sum of scaled Bregman divergences. In either case, however, maximization of the scan statistic by exhaustive search over all partitionings of the data requires exponential time. To make this optimization computationally feasible, we prove sufficient conditions under which the optimal partitioning is guaranteed to be consecutive. This Consecutive Partitions Property generalizes the linear-time subset scanning property from two partitions (the detected subset and the remaining data elements) to the multiple partition case. While the number of consecutive partitionings of n elements into t partitions scales as O (n t − 1) , making it computationally expensive for large t, we present a dynamic programming approach which identifies the optimal consecutive partitioning in O (n 2 t) time, thus allowing for the exact and efficient solution of large-scale risk partitioning and multiple cluster detection problems. Finally, we demonstrate the detection performance and practical utility of partition scan statistics using simulated and real-world data. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
41. Practical Network Modeling via Tapered Exponential-Family Random Graph Models.
- Author
-
Blackburn, Bart and Handcock, Mark S.
- Subjects
- *
RANDOM graphs , *EXPONENTIAL families (Statistics) , *SOCIAL network analysis , *DATA analysis - Abstract
Exponential-family Random Graph Models (ERGMs) have long been at the forefront of the analysis of relational data. The exponential-family form allows complex network dependencies to be represented. Models in this class are interpretable, flexible and have a strong theoretical foundation. The availability of powerful user-friendly open-source software allows broad accessibility and use. However, ERGMs sometimes suffer from a serious condition known as near-degeneracy, in which the model exhibits unrealistic probabilistic behavior or a severe lack-of-fit to real network data. Recently, Fellows and Handcock proposed a new model class, the Tapered ERGM, which circumvents the issue of near-degeneracy while maintaining the desirable features of ERGMs. However, the question of how to determine the proper amount of tapering needed for any model was heretofore left unanswered. This article develops a new methodology for how to determine the necessary level of tapering and as such provides a new approach to inference for the Tapered ERGM class. Noting that a Tapered ERGM can always be made nondegenerate, we offer data-driven approaches for determining the amount of tapering necessary. The mean-value parameter estimates are unaffected by tapering, and we show that the natural parameter estimates are numerically weakly varying by the level of tapering. We then apply the Tapered ERGM to two published networks to demonstrate its effectiveness in cases where typical ERGMs fail and present the case for Tapered ERGMs replacing ERGMs entirely. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
42. Computing Pseudolikelihood Estimators for Exponential-Family Random Graph Models.
- Author
-
SCHMID, CHRISTIAN S. and HUNTER, DAVID R.
- Subjects
- *
RANDOM graphs , *MATRIX inversion , *HESSIAN matrices , *MAXIMUM likelihood statistics , *ASYMPTOTIC normality , *EXPONENTIAL families (Statistics) - Abstract
The reputation of the maximum pseudolikelihood estimator (MPLE) for Exponential Random Graph Models (ERGM) has undergone a drastic change over the past 30 years. While first receiving broad support, mainly due to its computational feasibility and the lack of alternatives, general opinions started to change with the introduction of approximate maximum likelihood estimator (MLE) methods that became practicable due to increasing computing power and the introduction of MCMC methods. Previous comparison studies appear to yield contradicting results regarding the preference of these two point estimators; however, there is consensus that the prevailing method to obtain an MPLE's standard error by the inverse Hessian matrix generally underestimates standard errors. We propose replacing the inverse Hessian matrix by an approximation of the Godambe matrix that results in confidence intervals with appropriate coverage rates and that, in addition, enables examining for model degeneracy. Our results also provide empirical evidence for the asymptotic normality of the MPLE under certain conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
43. The exponential power-G family of distributions: Properties, simulations, regression modeling and applications.
- Author
-
Ferreira, Alexsandro Arruda and Cordeiro, Gauss Moutinho
- Subjects
REGRESSION analysis ,EXPONENTIAL families (Statistics) ,CENSORING (Statistics) ,MAXIMUM likelihood statistics - Abstract
The new exponential power-G is introduced following Alzaatreh et al. (2013). Some of its main statistical properties are provided in terms of the exponentiated-G properties. Maximum likelihood estimation and simulations are addressed using the log-logistic for the baseline distribution. The log-exponential power log-logistic regression model is constructed and applied to censored data. The utility of the new models is proved by means of two real data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
44. Comparison of GEE and GLMM methods for longitudinal data (Case study: Determinants of the percentage of poor people in Indonesia, 2015-2019).
- Author
-
Sihombing, Pardomuan Robinson, Notodiputro, Khairil A., and Sartono, Bagus
- Subjects
- *
POOR people , *PANEL analysis , *LONGITUDINAL method , *GENERALIZED estimating equations , *PERCENTILES , *EXPONENTIAL families (Statistics) , *FOREIGN investments - Abstract
The development model of the GLM for longitudinal data that has not normally distributed (but still in the exponential family) and correlates with response variables is the Generalized Estimating Equations (GEE) and Generalized Linear Mixed-effects Model (GLMM) models. This study compares the GEE model with the GLMM on longitudinal data in modeling poor people in Indonesia in 2015-2019. The data source used is from the publication of the Central Statistics Agency. Based on the smaller RMSE and AIC criteria, the GLMM model is better than the GEE model in modeling the percentage of poor people in Indonesia. The Gini ratio, the rate of Households in Slums, and the percentage of Informal Workers have a significant positive effect on the percentage of poor people. Meanwhile, the percentage of households having access to HDI, economic growth, domestic and foreign investment value have a significant negative impact on poor people. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
45. Discussion of "A Tale of Two Datasets: Representativeness and Generalisability of Inference for Samples of Networks" by Pavel N. Krivitsky, Pietro Coletti, and Niel Hens.
- Author
-
Schweinberger, Michael and Fritz, Cornelius
- Subjects
- *
HENS , *NULL hypothesis , *GOODNESS-of-fit tests , *TEST scoring , *SOCIAL accounting , *EXPONENTIAL families (Statistics) , *POSSIBILITY - Abstract
This article explores the challenges of making accurate inferences about populations based on samples in settings where the data is dependent, such as network, spatial, and temporal data. It emphasizes the need for statistical research that is both driven by real-world applications and adheres to statistical principles. The article discusses the assumption of local dependence and suggests the possibility of relaxing this assumption to account for overlapping social circles. It also discusses regression-type diagnostic tools based on residuals and score tests for goodness-of-fit. The authors propose using residuals to test whether an additional network feature should be added to a model and suggest using a score test to test the composite null hypothesis. These diagnostics are applicable to an augmented exponential family with canonical parameter vector and sufficient statistic vector. The article also highlights the importance of out-of-sample assessments and scalable inference methods in network analysis. [Extracted from the article]
- Published
- 2023
- Full Text
- View/download PDF
46. A New Third-Order Family of Multiple Root-Findings Based on Exponential Fitted Curve.
- Author
-
Kanwar, Vinay, Cordero, Alicia, Torregrosa, Juan R., Rajput, Mithil, and Behl, Ramandeep
- Subjects
- *
NONLINEAR equations , *ITERATIVE methods (Mathematics) , *EXPONENTIAL families (Statistics) , *MULTIPLICITY (Mathematics) - Abstract
In this paper, we present a new third-order family of iterative methods in order to compute the multiple roots of nonlinear equations when the multiplicity (m ≥ 1) is known in advance. There is a plethora of third-order point-to-point methods, available in the literature; but our methods are based on geometric derivation and converge to the required zero even though derivative becomes zero or close to zero in vicinity of the required zero. We use the exponential fitted curve and tangency conditions for the development of our schemes. Well-known Chebyshev, Halley, super-Halley and Chebyshev–Halley are the special members of our schemes for m = 1 . Complex dynamics techniques allows us to see the relation between the element of the family of iterative schemes and the wideness of the basins of attraction of the simple and multiple roots, on quadratic polynomials. Several applied problems are considered in order to demonstrate the performance of our methods and for comparison with the existing ones. Based on the numerical outcomes, we deduce that our methods illustrate better performance over the earlier methods even though in the case of multiple roots of high multiplicity. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
47. gplsim: An R Package for Generalized Partially Linear Single-index Models.
- Author
-
Tianhai Zu and Yan Yu
- Subjects
- *
AIR pollution , *EXPONENTIAL families (Statistics) - Abstract
Generalized partially linear single-index models (GPLSIMs) are important tools in nonparametric regression. They extend popular generalized linear models to allow flexible nonlinear dependence on some predictors while overcoming the "curse of dimensionality." We develop an R package gplsim that implements efficient spline estimation of GPLSIMs, proposed by Yu and Ruppert (2002) and Yu et al. (2017), for a response variable from a general exponential family. The package builds upon the popular mgcv package for generalized additive models (GAMs) and provides functions that allow users to fit GPLSIMs with various link functions, select smoothing tuning parameter λ against generalized cross-validation or alternative choices, and visualize the estimated unknown univariate function of single-index term. In this paper, we discuss the implementation of gplsim in detail, and illustrate the use case through a sine-bump simulation study with various links and a real-data application to air pollution data. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
48. The Marshall-Olkin Odd Exponential Half Logistic-G Family of Distributions: Properties and Applications.
- Author
-
Oluyede, Broderick and Chipepa, Fastel
- Subjects
EXPONENTIAL families (Statistics) ,HAZARD function (Statistics) ,MAXIMUM likelihood statistics - Abstract
We develop a new family of distributions, referred to as the Marshall-Olkin odd exponential half logistic-G, which is a linear combination of the exponential-G family of distributions. The family of distributions can handle heavy tailed data and has non-monotonic hazard rate functions. We also conducted a simulation study to assess the performance of the proposed model. Real data examples are provided to demonstrate the usefulness of the proposed model in comparison with several other existing models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
49. Generic Unsupervised Optimization for a Latent Variable Model With Exponential Family Observables.
- Author
-
Mousavi, Hamid, Drefs, Jakob, Hirschberger, Florian, and Lücke, Jörg
- Subjects
- *
LATENT variables , *EXPONENTIAL families (Statistics) , *GAUSSIAN distribution , *MAXIMUM likelihood statistics - Abstract
Latent variable models (LVMs) represent observed variables by parameterized functions of latent variables. Prominent examples of LVMs for unsupervised learning are probabilistic PCA or probabilistic sparse coding which both assume a weighted linear summation of the latents to determine the mean of a Gaussian distribution for the observables. In many cases, however, observables do not follow a Gaussian distribution. For unsupervised learning, LVMs which assume specific non-Gaussian observables (e.g., Bernoulli or Poisson) have therefore been considered. Already for specific choices of distributions, parameter optimization is challenging and only a few previous contributions considered LVMs with more generally defined observable distributions. In this contribution, we do consider LVMs that are defined for a range of different distributions, i.e., observables can follow any (regular) distribution of the exponential family. Furthermore, the novel class of LVMs presented here is defined for binary latents, and it uses maximization in place of summation to link the latents to observables. In order to derive an optimization procedure, we follow an expectation maximization approach for maximum likelihood parameter estimation. We then show, as our main result, that a set of very concise parameter update equations can be derived which feature the same functional form for all exponential family distributions. The derived generic optimization can consequently be applied (without further derivations) to different types of metric data (Gaussian and non-Gaussian) as well as to different types of discrete data. Moreover, the derived optimization equations can be combined with a recently suggested variational acceleration which is likewise generically applicable to the LVMs considered here. Thus, the combination maintains generic and direct applicability of the derived optimization procedure, but, crucially, enables efficient scalability. We numerically verify our analytical results using different observable distributions, and, furthermore, discuss some potential applications such as learning of variance structure, noise type estimation and denoising. [ABSTRACT FROM AUTHOR]
- Published
- 2023
50. Reproducing Kernels and New Approaches in Compositional Data Analysis.
- Author
-
Binglin Li, Changwon Yoon, and Jeongyoun Ahn
- Subjects
- *
PROJECTIVE geometry , *DATA analysis , *EXPONENTIAL families (Statistics) , *SPHERICAL harmonics , *FUNCTION spaces , *HILBERT space , *SIMPLEX algorithm , *SUPPORT vector machines - Abstract
Compositional data, such as human gut microbiomes, consist of non-negative variables where only the relative values of these variables are available. Analyzing compositional data requires careful treatment of the geometry of the data. A common geometrical approach to understanding such data is through a regular simplex. The majority of existing approaches rely on log-ratio or power transformations to address the inherent simplicial geometry. In this work, based on the key observation that compositional data are projective, we reinterpret the compositional domain as a group quotient of a sphere, leveraging the intrinsic connection between projective and spherical geometry. This interpretation enables us to understand the function spaces on the compositional domain in terms of those on a sphere, and furthermore, to utilize spherical harmonics theory for constructing a compositional Reproducing Kernel Hilbert Space (RKHS). The construction of RKHS for compositional data opens up new research avenues for future methodology developments, particularly introducing well-developed kernel methods to compositional data analysis. We demonstrate the wide applicability of the proposed theoretical framework with examples of nonparametric density estimation, kernel exponential family, and support vector machine for compositional data. [ABSTRACT FROM AUTHOR]
- Published
- 2023
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.