3,014 results on '"*EXPECTATION-maximization algorithms"'
Search Results
2. High‐Dimensional Overdispersed Generalized Factor Model With Application to Single‐Cell Sequencing Data Analysis.
- Author
-
Nie, Jinyu, Qin, Zhilong, and Liu, Wei
- Subjects
EXPECTATION-maximization algorithms ,RANDOM matrices ,FACTOR analysis ,NONLINEAR analysis ,SEQUENCE analysis - Abstract
The current high‐dimensional linear factor models fail to account for the different types of variables, while high‐dimensional nonlinear factor models often overlook the overdispersion present in mixed‐type data. However, overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high‐dimensional nonlinear factor analysis on overdispersed mixed‐type data. Our approach incorporates an additional error term to capture the overdispersion that cannot be accounted for by factors alone. However, this introduces significant computational challenges due to the involvement of two high‐dimensional latent random matrices in the nonlinear model. To overcome these challenges, we propose a novel variational EM algorithm that integrates Laplace and Taylor approximations. This algorithm provides iterative explicit solutions for the complex variational parameters and is proven to possess excellent convergence properties. We also develop a criterion based on the singular value ratio to determine the optimal number of factors. Numerical results demonstrate the effectiveness of this criterion. Through comprehensive simulation studies, we show that OverGFM outperforms state‐of‐the‐art methods in terms of estimation accuracy and computational efficiency. Furthermore, we demonstrate the practical merit of our method through its application to two datasets from genomics. To facilitate its usage, we have integrated the implementation of OverGFM into the R package GFM. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Variational Bayesian EM Algorithm for Quantile Regression in Linear Mixed Effects Models.
- Author
-
Wang, Weixian and Tian, Maozai
- Subjects
EXPECTATION-maximization algorithms ,FIXED effects model ,GIBBS sampling ,BAYESIAN analysis ,QUANTILE regression ,DATA analysis - Abstract
This paper extends the normal-beta prime (NBP) prior to Bayesian quantile regression in linear mixed effects models and conducts Bayesian variable selection for the fixed effects of the model. The choice of hyperparameters in the NBP prior is crucial, and we employed the Variational Bayesian Expectation–Maximization (VBEM) for model estimation and variable selection. The Gibbs sampling algorithm is a commonly used Bayesian method, and it can also be combined with the EM algorithm, denoted as GBEM. The results from our simulation and real data analysis demonstrate that both the VBEM and GBEM algorithms provide robust estimates for the hyperparameters in the NBP prior, reflecting the sparsity level of the true model. The VBEM and GBEM algorithms exhibit comparable accuracy and can effectively select important explanatory variables. The VBEM algorithm stands out in terms of computational efficiency, significantly reducing the time and resource consumption in the Bayesian analysis of high-dimensional, longitudinal data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. On regime changes in text data using hidden Markov model of contaminated vMF distribution.
- Author
-
Zhang, Yingying, Sarkar, Shuchismita, Chen, Yuanyuan, and Zhu, Xuwen
- Subjects
FINANCIAL statements ,MARKOV processes ,EXPECTATION-maximization algorithms - Abstract
This paper presents a novel methodology for analyzing temporal directional data with scatter and heavy tails. A hidden Markov model with contaminated von Mises-Fisher emission distribution is developed. The model is implemented using forward and backward selection approach that provides additional flexibility for contaminated as well as non-contaminated data. The utility of the method for finding homogeneous time blocks (regimes) is demonstrated on several experimental settings and two real-life text data sets containing presidential addresses and corporate financial statements respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Dynamic Bayesian networks for spatiotemporal modeling and its uncertainty in tradeoffs and synergies of ecosystem services: a case study in the Tarim River Basin, China.
- Author
-
Hu, Yang, Xue, Jie, Zhao, Jianping, Feng, Xinlong, Sun, Huaiwei, Tang, Junhu, and Chang, Jingjing
- Subjects
MACHINE learning ,BAYESIAN analysis ,EXPECTATION-maximization algorithms ,CARBON sequestration ,ECOSYSTEM services - Abstract
Ecosystem services (ESs) refer to the benefits that humans obtain from ecosystems. These services are subject to environmental changes and human interventions, which introduce a significant level of uncertainty. Traditional ES modeling approaches often employ Bayesian networks, but they fall short in capturing spatiotemporal dynamic change processes. To address this limitation, dynamic Bayesian networks (DBNs) have emerged as stochastic models capable of incorporating uncertainty and capturing dynamic changes. Consequently, DBNs have found increasing application in ES modeling. However, the structure and parameter learning of DBNs present complexities within the field of ES modeling. To mitigate the reliance on expert knowledge, this study proposes an algorithm for structure and parameter learning, integrating the InVEST (Integrated Valuation of Ecosystem Services and Trade-Offs) model with DBNs to develop a comprehensive understanding of the spatiotemporal dynamics and uncertainty of ESs in the Tarim River Basin, China from 2000 to 2020. The study further evaluates the tradeoffs and synergies among four key ecosystem services: water yield, habitat quality, sediment delivery ratio, and carbon storage and sequestration. The findings show that (1) the proposed structure learning and parameter learning algorithm for DBNs, including the hill-climb algorithm, linear analysis, the Markov blanket, and the EM algorithm, effectively address subjective factors that can influence model learning when dealing with uncertainty; (2) significant spatial heterogeneity is observed in the supply of ESs within the Tarim River Basin, with notable changes in habitat quality, water yield, and sediment delivery ratios occurring between 2000–2005, 2010–2015, and 2015–2020, respectively; (3) tradeoffs exist between water yield and habitat quality, as well as between soil conservation and carbon sequestration, while synergies are found among habitat quality, soil retention, and carbon sequestration. The land-use type emerges as the most influential factor affecting the tradeoffs and synergies of ESs. This study serves to validate the capacity of DBNs in addressing spatiotemporal dynamic changes and establishes an improved research methodology for ES modeling that considers uncertainty. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Blood supply chain location-inventory problem considering incentive programs: comparison and analysis of NSGA-II, NRGA and electromagnetic algorithms.
- Author
-
Alikhani, Tayebeh, Dezfoulian, Hamidreza, and Samouei, Parvaneh
- Subjects
EXPECTATION-maximization algorithms ,METAHEURISTIC algorithms ,SUPPLY chain management ,SUPPLY chain disruptions ,STATISTICAL hypothesis testing - Abstract
Problem Blood is a rare perishable substance with limited life in the real world and blood supply chain management is a vital subject. Hence, it is trying to design an efficient supply chain network to create a balance between blood supply and demand, particularly in deficient conditions. One effective solution for blood deficiency compensation is the use of incentive programs at the right times to encourage people for blood donation. The novel aspect of this study considers a new mathematical model to design a blood supply chain network with the location of temporary centers for collecting donated blood, in addition to incentive programs in the right periods to actualize the goal of creating blood supply-demand equilibrium and minimizing the cost of the network. Method In this paper, four methods have been used in different dimensions to solve the proposed model. In this case, augmented epsilon constraint (AEC/EC) was used for small dimensions, while electromagnetic algorithm (EM), Non-dominated ranked genetic algorithm (NRGA), and non-dominated sorting genetic algorithm (NSGA-II) were used for large dimensions due to the inherent complexity of the problem. Results The performance of algorithms was analyzed based on the four standard indicators. Then, their outputs were evaluated using statistical assumption tests at the significance level of 0.05. In three considered indicators (SNS, MID, and TIME indicators), the NSGA-II algorithm outperformed the NRGA and EM algorithms. This case indicated the superiority of the NSGA-II algorithm over the NRGA and EM algorithms especially for problem solution time, which is one of the most significant indicators used in metaheuristic algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. A class of distorted Gaussian copulas: theories and applications.
- Author
-
Shao, Hui and Zhang, Zhe George
- Subjects
EXPECTATION-maximization algorithms ,PARAMETER estimation ,TRANSACTION costs ,DEFAULT (Finance) ,VALUATION - Abstract
This study introduces a novel copula class, referred to as the distorted GAB copula (hereafter, dGAB copula), as an alternative to the Gaussian copula, which has shown limitations in capturing tail dependence. Much like the Gaussian copula, the dGAB copula can be uniquely determined by its bivariate marginal copulas and offers effective tail dependence modeling capabilities. To demonstrate its practical applicability, we showcase its use in the valuation of basket default swaps. Furthermore, we propose a parameter estimation approach based on the EM algorithm tailored to the dGAB copula. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Community detection in multiplex continous weighted nodes networks using an extension of the stochastic block model.
- Author
-
El Haj, Abir
- Subjects
EXPECTATION-maximization algorithms ,STOCHASTIC models ,SOCIAL networks ,MULTIPLEXING - Abstract
The stochastic block model (SBM) is a probabilistic model aimed at clustering individuals within a simple network based on their social behavior. This network consists of individuals and edges representing the presence or absence of relationships between each pair of individuals. This paper aims to extend the traditional stochastic block model to accommodate multiplex weighted nodes networks. These networks are characterized by multiple relationship types occurring simultaneously among network individuals, with each individual associated with a weight representing its influence in the network. We introduce an inference method utilizing a variational expectation-maximization algorithm to estimate model parameters and classify individuals. Finally, we demonstrate the effectiveness of our approach through applications using simulated and real data, highlighting its main characteristics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. A Generalized Finite Mixture Model for Asymmetric Probability Distribution Observations.
- Author
-
Lu, Cheng, Liu, Jie, Meng, Xianghua, Zhu, Liangcong, and Wei, Xiaodong
- Subjects
SKEWNESS (Probability theory) ,GAUSSIAN mixture models ,DISTRIBUTION (Probability theory) ,EXPECTATION-maximization algorithms ,EQUATIONS - Abstract
Finite mixture model is a useful probabilistic model for representing the probability distributions of observations. Gaussian mixture model (GMM) is a widely used one whose parameters are always estimated by the famous EM algorithm. But when probability distributions contain asymmetric modes, GMM requires much more constituent components to achieve a satisfied accuracy. Therefore, a generalized mixture finite model is proposed. First, it adopts the derivative λ -PDF which can well represent the asymmetric probability distributions as the probability density. However, the differential operation and solving the likelihood equations are scarcely possible in the M step of EM algorithm for the complex expression of the derivative λ -PDF. Second, a pseudo EM method is proposed to avoid the abovementioned difficulty which finds the pseudo maximum likelihood estimates of the parameters by utilizing the moment matching principle. It more easily estimates the parameters by solving a series of moment matching equations. Finally, four examples are presented to verify that the proposed generalized finite mixture model has advantage on representing the asymmetric probability distributions observations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Modeling multiple‐criterion diagnoses by heterogeneous‐instance logistic regression.
- Author
-
Yang, Chun‐Hao, Li, Ming‐Han, Wen, Shu‐Fang, and Chang, Sheng‐Mao
- Subjects
ALZHEIMER'S disease ,MILD cognitive impairment ,EXPECTATION-maximization algorithms ,COGNITION ,LOGISTIC regression analysis - Abstract
Mild cognitive impairment (MCI) is a prodromal stage of Alzheimer's disease (AD) that causes a significant burden in caregiving and medical costs. Clinically, the diagnosis of MCI is determined by the impairment statuses of five cognitive domains. If one of these cognitive domains is impaired, the patient is diagnosed with MCI, and if two out of the five domains are impaired, the patient is diagnosed with AD. In medical records, most of the time, the diagnosis of MCI/AD is given, but not the statuses of the five domains. We may treat the domain statuses as missing variables. This diagnostic procedure relates MCI/AD status modeling to multiple‐instance learning, where each domain resembles an instance. However, traditional multiple‐instance learning assumes common predictors among instances, but in our case, each domain is associated with different predictors. In this article, we generalized the multiple‐instance logistic regression to accommodate the heterogeneity in predictors among different instances. The proposed model is dubbed heterogeneous‐instance logistic regression and is estimated via the expectation‐maximization algorithm because of the presence of the missing variables. We also derived two variants of the proposed model for the MCI and AD diagnoses. The proposed model is validated in terms of its estimation accuracy, latent status prediction, and robustness via extensive simulation studies. Finally, we analyzed the National Alzheimer's Coordinating Center‐Uniform Data Set using the proposed model and demonstrated its potential. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Statistical inference of multi-state transition model for longitudinal data with measurement error and heterogeneity.
- Author
-
Qin, Jiajie and Guan, Jing
- Subjects
MARKOV chain Monte Carlo ,RANDOM matrices ,EXPECTATION-maximization algorithms ,MEASUREMENT errors ,INFERENTIAL statistics ,SOCIAL medicine ,MATRIX effect ,COVARIANCE matrices - Abstract
Multi-state transition model is typically used to analyze longitudinal data in medicine and sociology. Moreover, variables in longitudinal studies usually are error-prone, and random effects are heterogeneous, which will result in biased estimates of the interest parameters. This article is intended to estimate the parameters of the multi-state transition model for longitudinal data with measurement error and heterogeneous random effects and further consider the covariate related to the covariance matrix of random effects is also error-prone when the covariate in the transition model is error-prone. We model the covariance matrix of random effects through the modified Cholesky decomposition and propose a pseudo-likelihood method based on the Monte Carlo expectation-maximization algorithm and the Bayesian method based on Markov Chain Monte Carlo to infer and calculate the whole estimates. Meanwhile, we obtain the asymptotic properties and evaluate the finite sample performance of the proposed method by simulation, which is well in terms of Bias, RMSE, and coverage rate of confidence intervals. In addition, we apply the proposed method to the MFUS data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. A composite Bayesian approach for quantile curve fitting with non-crossing constraints.
- Author
-
Wang, Qiao and Cai, Zhongheng
- Subjects
CURVE fitting ,EXPECTATION-maximization algorithms ,BAYESIAN analysis ,SMOOTHNESS of functions ,QUANTILES ,GIBBS sampling - Abstract
To fit a set of quantile curves, Bayesian simultaneous quantile curve fitting methods face some challenges in properly specifying a feasible formulation and efficiently accommodating the non-crossing constraints. In this article, we propose a new minimization problem and develop its corresponding Bayesian analysis. The new minimization problem imposes two penalties to control not only the smoothness of fitted quantile curves but also the differences between quantile curves. This enables a direct inference on differences of quantile curves and facilitates improved information sharing among quantiles. After adopting B-spline approximation for the positive smoothing functions in the minimization problem, we specify the pseudo composite asymmetric Laplace likelihood and derive its priors. The computation algorithm, including partially collapsed Gibbs sampling for model parameters and Monte Carlo Expectation-Maximization algorithm for penalty parameters, are provided to carry out the proposed approach. The extensive simulation studies show that, compared with other candidate methods, the proposed approach yields more robust estimation. More advantages of the proposed approach are observed for the extreme quantiles, heavy-tailed random errors, and inference on the differences of quantiles. We also demonstrate the relative performances of the proposed approach and other competing methods through two real data analyses. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Joint Beamforming Design and User Clustering Algorithm in NOMA-Assisted ISAC Systems.
- Author
-
Yang, Qingqing, Tang, Runpeng, and Peng, Yi
- Subjects
EXPECTATION-maximization algorithms ,GAUSSIAN mixture models ,FRACTIONAL programming ,COVARIANCE matrices ,EUCLIDEAN distance ,CLUSTERING algorithms - Abstract
To enhance the performance of non-orthogonal multiple access (NOMA)-assisted integrated sensing and communication (ISAC) systems in multi-user distributed scenarios, an improved Gaussian Mixture Model (GMM)-based user clustering algorithm is proposed. This algorithm is tailored for ISAC systems, significantly improving bandwidth reuse gains and reducing serial interference. First, using the Sum of Squared Errors (SSE), the algorithm reduces sensitivity to the initial cluster center locations, improving clustering accuracy. Then, direction weight factors are introduced based on the base station position and a penalty function involving users' Euclidean distances and sensing power. Modifications to the EM algorithm in calculating posterior probabilities and updating the covariance matrix help align user clusters with the characteristics of NOMAISAC systems. This improves users' interference resistance, lowers decoding difficulty, and optimizes the system's sensing capabilities. Finally, a fractional programming (FP) approach addresses the non-convex joint beamforming design problem, enhancing power and channel gains and achieving co-optimizing sensing and communication signals. The simulation results show that, under the improved GMM user clustering algorithm and FP optimization, the NOMA-ISAC system improves user spectral efficiency by 4.3% and base station beam intensity by 5.4% compared to traditional ISAC systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. An extended Markov-switching model approach to latent heterogeneity in departmentalized manpower systems.
- Author
-
Ossai, Everestus O., Nduka, Uchenna C., Madukaife, Mbanefo S., Udom, Akaninyene U., and Ugwu, Samson O.
- Subjects
WORKFORCE planning ,EXPECTATION-maximization algorithms ,MARKOV processes ,LABOR supply ,PRODUCTION planning - Abstract
In recent works in manpower planning interest has been awakened in modeling manpower systems in departmentalized framework. This, as a form of disaggregation, may solve the problem of observable heterogeneity but not latent heterogeneity; it rather opens up other aspects of latent heterogeneity hitherto unaccounted for in classical (non departmentalized) manpower models. In this article, a multinomial Markov-switching model is formulated for investigating latent heterogeneity in intra-departmental and interdepartmental transitions in departmentalized manpower systems. The formulation incorporates extensions of the mover-stayer principle resulting in several competing models. The best manpower model is chosen based on the optimum number of hidden states established by the use of Expectation-Maximization iterative algorithm for estimation of the model parameters and a search procedure for assessing model performance against one another. The illustration establishes the usefulness of the model formulation in highlighting hidden disparities in personnel transitions in a departmentalized manpower system and in avoiding wrong model specification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Zero-Inflated Binary Classification Model with Elastic Net Regularization.
- Author
-
Xin, Hua, Lio, Yuhlong, Chen, Hsien-Ching, and Tsai, Tzong-Ru
- Subjects
MACHINE learning ,MAXIMUM likelihood statistics ,EXPECTATION-maximization algorithms ,OPEN-ended questions ,DIABETES - Abstract
Zero inflation and overfitting can reduce the accuracy rate of using machine learning models for characterizing binary data sets. A zero-inflated Bernoulli (ZIBer) model can be the right model to characterize zero-inflated binary data sets. When the ZIBer model is used to characterize zero-inflated binary data sets, overcoming the overfitting problem is still an open question. To improve the overfitting problem for using the ZIBer model, the minus log-likelihood function of the ZIBer model with the elastic net regularization rule for an overfitting penalty is proposed as the loss function. An estimation procedure to minimize the loss function is developed in this study using the gradient descent method (GDM) with the momentum term as the learning rate. The proposed estimation method has two advantages. First, the proposed estimation method can be a general method that simultaneously uses L 1 - and L 2 -norm terms for penalty and includes the ridge and least absolute shrinkage and selection operator methods as special cases. Second, the momentum learning rate can accelerate the convergence of the GDM and enhance the computation efficiency of the proposed estimation procedure. The parameter selection strategy is studied, and the performance of the proposed method is evaluated using Monte Carlo simulations. A diabetes example is used as an illustration. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. A flexible time-varying coefficient rate model for panel count data.
- Author
-
Sun, Dayu, Guo, Yuanyuan, Li, Yang, Sun, Jianguo, and Tu, Wanzhu
- Subjects
SEXUALLY transmitted diseases ,EXPECTATION-maximization algorithms ,NUMERICAL integration ,DATA modeling ,SIEVES - Abstract
Panel count regression is often required in recurrent event studies, where the interest is to model the event rate. Existing rate models are unable to handle time-varying covariate effects due to theoretical and computational difficulties. Mean models provide a viable alternative but are subject to the constraints of the monotonicity assumption, which tends to be violated when covariates fluctuate over time. In this paper, we present a new semiparametric rate model for panel count data along with related theoretical results. For model fitting, we present an efficient EM algorithm with three different methods for variance estimation. The algorithm allows us to sidestep the challenges of numerical integration and difficulties with the iterative convex minorant algorithm. We showed that the estimators are consistent and asymptotically normally distributed. Simulation studies confirmed an excellent finite sample performance. To illustrate, we analyzed data from a real clinical study of behavioral risk factors for sexually transmitted infections. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Multiple arbitrarily inflated negative binomial regression model and its application.
- Author
-
Abusaif, Ihab and Kuş, Coşkun
- Subjects
NEGATIVE binomial distribution ,MONTE Carlo method ,POISSON distribution ,EXPECTATION-maximization algorithms ,REGRESSION analysis ,BINOMIAL distribution - Abstract
This paper introduces a novel modification of the negative binomial distribution, which serves as a generalization encompassing both negative binomial and zero-inflated negative binomial distributions. This innovative distribution offers flexibility by accommodating an arbitrary number of inflation points at various locations. The paper explores key distributional properties associated with this modified distribution. Additionally, this study proposes several estimators designed to obtain estimates for the unknown parameters. Furthermore, the paper introduces a new count regression model that utilizes the modified distribution. To assess the performance of the proposed distribution and the count regression model, a comprehensive Monte Carlo simulation study is conducted. In the final stage of the paper, a real-world dataset is scrutinized to ascertain the superiority of the proposed model. This empirical analysis contributes to validating the practical applicability and effectiveness of the newly introduced distribution in comparison to existing models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Methods for Estimating Parameters of the Common Cause Failure Model Based on Data with Uncertainty.
- Author
-
Nguyen, Huu Du and Gouno, Evans
- Subjects
MAXIMUM likelihood statistics ,DATA augmentation ,COMPUTER simulation ,EXPECTATION-maximization algorithms ,DATA modeling ,PROBABILITY theory - Abstract
In this paper, we propose a new method to deal with uncertain data in the context of Common Cause Failure (CCF) analysis. Uncertain CCF data refer to the data for which the number of components involved in the failure events is not exactly known. We introduce a new formalism to describe uncertain CCF data to avoid subjective probabilities for the number of failed components in each CCF event that are used in classical methods such as the impact vector method. The parameters of the α -factor model are estimated using the maximum likelihood method relying on properties of the nested Dirichlet distribution and grouped Dirichlet distribution. A data augmentation technique with an expectation-maximization algorithm is also developed for some schemes of data with uncertainty. Finally, we evaluate the performance of the proposed method through numerical simulations and illustrate its application using an example from the literature. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Jobs-housing balance and travel patterns among different occupations as revealed by Hidden Markov mixture models: the case of Hong Kong.
- Author
-
Zhang, Feiyang, Loo, Becky P. Y., Lan, Hui, Chan, Antoni B., and Hsiao, Janet H.
- Subjects
HIDDEN Markov models ,EXPECTATION-maximization algorithms ,TRAFFIC congestion ,MARKOV processes ,CITIES & towns - Abstract
The spatial mismatch between jobs and housing in cities creates long daily travels that exacerbate climate change, air pollution, and traffic congestion. Yet, not enough research on occupational differences has been done. This study first applies the Hidden Markov Mixture Model (H3M) to model travel patterns for different occupation groups in Hong Kong. Then, the Variational Bayesian Hierarchical EM algorithm is used to identify common lifestyle clusters. Next, a binary logistic regression is developed to examine whether the lifestyle clusters can be explained by jobs-housing balance. This study is among the first to consider travel patterns as a Markov process and apply H3M to examine jobs-housing balance by fine-grain occupation group. The method is transferable and universally applicable; and the results provide occupation-specific insights on jobs-housing balance in an Asian context. The research findings suggest that different occupation groups have different travel patterns in Hong Kong. Two lifestyle clusters, "balanced and compact activity space" and "work-oriented and extensive travels", are unveiled. Notably, the latter is associated a lower level of jobs-housing balance. Some occupations in the quaternary industry ("information and communications", "profession, science and technology", "real estate", and "finance and insurance") are having more serious jobs-housing imbalance. The paper concludes with a discussion on improving the occupation-specific jobs-housing balance in accordance with Hong Kong's future development goals. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. A Truncated Mixture Transition Model for Interval-Valued Time Series.
- Author
-
Luo, Yun and González-Rivera, Gloria
- Subjects
RATE of return on stocks ,EXPECTATION-maximization algorithms ,GAUSSIAN distribution ,TIME series analysis ,HETEROSCEDASTICITY - Abstract
We propose a model for interval-valued time series that specifies the conditional joint distribution of the upper and lower bounds as a mixture of truncated bivariate normal distributions. It preserves the interval natural order and provides great flexibility on capturing potential conditional heteroscedasticity and non-Gaussian features. The standard expectation maximization (EM) algorithm applied to truncated mixtures does not provide a closed-form solution in the M step. A new EM algorithm solves this problem. The model applied to the interval-valued IBM daily stock returns exhibits superior performance over competing models in-sample and out-of-sample evaluation. A trading strategy showcases the usefulness of our approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Change point detection for a skew normal distribution based on the Q-function.
- Author
-
Du, Yang and Cheng, Weihu
- Subjects
SKEWNESS (Probability theory) ,ASYMPTOTIC distribution ,GAUSSIAN distribution ,EXPECTATION-maximization algorithms ,CORPORATE finance ,CHANGE-point problems - Abstract
In this paper, we enhanced change point detection in skew normal distribution models by integrating the EM algorithm's Q-function with the modified information criterion (MIC). The new QMIC framework improves sensitivity and accuracy in detecting changes, outperforming the modified information criterion (MIC) and the traditional Bayesian information criterion (BIC). Due to the complexity of deriving analytic asymptotic distributions, bootstrap simulations were used to determine critical values at various significance levels. Extensive simulations demonstrate that QMIC offers superior detection capabilities. We applied the QMIC method to two stock market datasets, successfully identifying multiple change points, and highlighting its effectiveness for real-world financial data analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. A Highly Efficient Compressive Sensing Algorithm Based on Root-Sparse Bayesian Learning for RFPA Radar.
- Author
-
Wang, Ju, Shan, Bingqi, Duan, Song, and Zhang, Qin
- Subjects
EXPECTATION-maximization algorithms ,COMPUTATIONAL complexity ,PARAMETER estimation ,RADAR ,ALGORITHMS - Abstract
Off-grid issues and high computational complexity are two major challenges faced by sparse Bayesian learning (SBL)-based compressive sensing (CS) algorithms used for random frequency pulse interval agile (RFPA) radar. Therefore, this paper proposes an off-grid CS algorithm for RFPA radar based on Root-SBL to address these issues. To effectively cope with off-grid issues, this paper derives a root-solving formula inspired by the Root-SBL algorithm for velocity parameters applicable to RFPA radar, thus enabling the proposed algorithm to directly solve the velocity parameters of targets during the fine search stage. Meanwhile, to ensure computational feasibility, the proposed algorithm utilizes a simple single-level hierarchical prior distribution model and employs the derived root-solving formula to avoid the refinement of velocity grids. Moreover, during the fine search stage, the proposed algorithm combines the fixed-point strategy with the Expectation-Maximization algorithm to update the hyperparameters, further reducing computational complexity. In terms of implementation, the proposed algorithm updates hyperparameters based on the single-level prior distribution to approximate values for the range and velocity parameters during the coarse search stage. Subsequently, in the fine search stage, the proposed algorithm performs a grid search only in the range dimension and uses the derived root-solving formula to directly solve for the target velocity parameters. Simulation results demonstrate that the proposed algorithm maintains low computational complexity while exhibiting stable performance for parameter estimation in various multi-target off-grid scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. STAREG: Statistical replicability analysis of high throughput experiments with applications to spatial transcriptomic studies.
- Author
-
Li, Yan, Zhou, Xiang, Chen, Rui, Zhang, Xianyang, and Cao, Hongyuan
- Subjects
FALSE discovery rate ,EXPECTATION-maximization algorithms ,TRANSCRIPTOMES ,STATISTICS ,DATA analysis - Abstract
Replicable signals from different yet conceptually related studies provide stronger scientific evidence and more powerful inference. We introduce STAREG, a statistical method for replicability analysis of high throughput experiments, and apply it to analyze spatial transcriptomic studies. STAREG uses summary statistics from multiple studies of high throughput experiments and models the the joint distribution of p-values accounting for the heterogeneity of different studies. It effectively controls the false discovery rate (FDR) and has higher power by information borrowing. Moreover, it provides different rankings of important genes. With the EM algorithm in combination with pool-adjacent-violator-algorithm (PAVA), STAREG is scalable to datasets with millions of genes without any tuning parameters. Analyzing two pairs of spatially resolved transcriptomic datasets, we are able to make biological discoveries that otherwise cannot be obtained by using existing methods. Author summary: Irreplicable research wastes time, money, and/or resources. Approximately $28 billion is estimated to be spent on preclinical research that cannot be replicated every year in the United States alone. Possible causes of irreplicable research may include experimental design, laboratory practices, and data analysis. We focus on data analysis. The past two decades have witnessed the expansion and increased availability of genomic data from high-throughput experiments. Due to privacy concerns or logistic reasons, raw data can be difficult to access but summary data such as p-values are readily available. We introduce STAREG, which jointly analyzes p-values from multiple genomic datasets that target the same scientific question with different populations or different technologies. This allows us to have more convincing and robust findings. STAREG is computationally scalable with solid statistical analysis. Moreover, it is versatile, platform-independent, and only requires p-values as input. By analyzing data sets from spatially resolved transcriptomic studies, we make biological discoveries that otherwise cannot be obtained with existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Joint modeling of an outcome variable and integrated omics datasets using GLM-PO2PLS.
- Author
-
Gu, Zhujie, Uh, Hae-Won, Houwing-Duistermaat, Jeanine, and el Bouhaddani, Said
- Subjects
MAXIMUM likelihood statistics ,ASYMPTOTIC distribution ,EXPECTATION-maximization algorithms ,DATA integration ,DOWN syndrome - Abstract
In many studies of human diseases, multiple omics datasets are measured. Typically, these omics datasets are studied one by one with the disease, thus the relationship between omics is overlooked. Modeling the joint part of multiple omics and its association to the outcome disease will provide insights into the complex molecular base of the disease. Several dimension reduction methods which jointly model multiple omics and two-stage approaches that model the omics and outcome in separate steps are available. Holistic one-stage models for both omics and outcome are lacking. In this article, we propose a novel one-stage method that jointly models an outcome variable with omics. We establish the model identifiability and develop EM algorithms to obtain maximum likelihood estimators of the parameters for normally and Bernoulli distributed outcomes. Test statistics are proposed to infer the association between the outcome and omics, and their asymptotic distributions are derived. Extensive simulation studies are conducted to evaluate the proposed model. The method is illustrated by modeling Down syndrome as outcome and methylation and glycomics as omics datasets. Here we show that our model provides more insight by jointly considering methylation and glycomics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Range-based volatility modeling in financial markets using a family of scale mixtures of Birnbaum-Saunders distribution.
- Author
-
Tamandi, Mostafa, Desmond, Anthony F., and Jamalizadeh, Ahad
- Subjects
MONTE Carlo method ,EXPECTATION-maximization algorithms ,FINANCIAL markets ,MARKET volatility ,MIXTURES - Abstract
We propose scale mixtures of Birnbaum-Saunders distributions as a new class of positive skewed and leptokurtic distributions and use it to model volatility in stock markets. To estimate the model parameters, we develop an Expectation-Conditional-Maximization algorithm. The numerical performance of the proposed methodology is evaluated by means of Monte Carlo simulations. Application of the new model in volatility modeling is illustrated with some real-life data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Diagnostic analytics for a GARCH model under skew-normal distributions.
- Author
-
Liu, Yonghui, Wang, Jing, Yao, Zhao, Liu, Conan, and Liu, Shuangzhe
- Subjects
GARCH model ,MONTE Carlo method ,MAXIMUM likelihood statistics ,DATA analytics ,EXPECTATION-maximization algorithms ,CURVATURE - Abstract
In this paper, a generalized autoregressive conditional heteroskedasticity model under skew-normal distributions is studied. A maximum likelihood approach is taken and the parameters in the model are estimated based on the expectation-maximization algorithm. The statistical diagnostics is made through the local influence technique, with the normal curvature and diagnostics results established for the model under four perturbation schemes in identifying possible influential observations. A simulation study is conducted to evaluate the performance of our proposed method and a real-world application is presented as an illustrative example. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Prevalence estimates for COVID-19-related health behaviors based on the cheating detection triangular model.
- Author
-
Hsieh, Shu-Hui, Perri, Pier Francesco, and Hoffmann, Adrian
- Subjects
HEALTH behavior ,EXPECTATION-maximization algorithms ,SOCIAL desirability ,SOCIAL influence ,NONRESPONSE (Statistics) - Abstract
Background: Survey studies in medical and health sciences predominantly apply a conventional direct questioning (DQ) format to gather private and highly personal information. If the topic under investigation is sensitive or even stigmatizing, such as COVID-19-related health behaviors and adherence to non-pharmaceutical interventions in general, DQ surveys can lead to nonresponse and untruthful answers due to the influence of social desirability bias (SDB). These effects seriously threaten the validity of the results obtained, potentially leading to distorted prevalence estimates for behaviors for which the prevalence in the population is unknown. While this issue cannot be completely avoided, indirect questioning techniques (IQTs) offer a means to mitigate the harmful influence of SDB by guaranteeing the confidentiality of individual responses. The present study aims at assessing the validity of a recently proposed IQT, the Cheating Detection Triangular Model (CDTRM), in estimating the prevalence of COVID-19-related health behaviors while accounting for cheaters who disregard the instructions. Methods: In an online survey of 1,714 participants in Taiwan, we obtained CDTRM prevalence estimates via an Expectation-Maximization algorithm for three COVID-19-related health behaviors with different levels of sensitivity. The CDTRM estimates were compared to DQ estimates and to available official statistics provided by the Taiwan Centers for Disease Control. Additionally, the CDTRM allowed us to estimate the share of cheaters who disregarded the instructions and adjust the prevalence estimates for the COVID-19-related health behaviors accordingly. Results: For a behavior with low sensitivity, CDTRM and DQ estimates were expectedly comparable and in line with official statistics. However, for behaviors with medium and high sensitivity, CDTRM estimates were higher and thus presumably more valid than DQ estimates. Analogously, the estimated cheating rate increased with higher sensitivity of the behavior under study. Conclusions: Our findings strongly support the assumption that the CDTRM successfully controlled for the validity-threatening influence of SDB in a survey on three COVID-19-related health behaviors. Consequently, the CDTRM appears to be a promising technique to increase estimation validity compared to conventional DQ for health-related behaviors, and sensitive attributes in general, for which a strong influence of SDB is to be expected. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Addressing dispersion in mis‐measured multivariate binomial outcomes: A novel statistical approach for detecting differentially methylated regions in bisulfite sequencing data.
- Author
-
Zhao, Kaiqiong, Oualkacha, Karim, Zeng, Yixiao, Shen, Cathy, Klein, Kathleen, Lakhal‐Chaieb, Lajmi, Labbe, Aurélie, Pastinen, Tomi, Hudson, Marie, Colmegna, Inés, Bernatsky, Sasha, and Greenwood, Celia M. T.
- Subjects
EXPECTATION-maximization algorithms ,DNA methylation ,MEASUREMENT errors ,CELL communication ,RHEUMATOID arthritis - Abstract
Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra‐parametric variations, also known as dispersion. While dispersion in univariate binomial regression has been extensively studied, addressing dispersion in the context of multivariate outcomes remains a complex and relatively unexplored task. The complexity arises from a noteworthy data characteristic observed in our motivating dataset: non‐constant yet correlated dispersion across outcomes. To address this challenge and account for possible measurement error, we propose a novel hierarchical quasi‐binomial varying coefficient mixed model, which enables flexible dispersion patterns through a combination of additive and multiplicative dispersion components. To maximize the Laplace‐approximated quasi‐likelihood of our model, we further develop a specialized two‐stage expectation‐maximization (EM) algorithm, where a plug‐in estimate for the multiplicative scale parameter enhances the speed and stability of the EM iterations. Simulations demonstrated that our approach yields accurate inference for smooth covariate effects and exhibits excellent power in detecting non‐zero effects. Additionally, we applied our proposed method to investigate the association between DNA methylation, measured across the genome through targeted custom capture sequencing of whole blood, and levels of anti‐citrullinated protein antibodies (ACPA), a preclinical marker for rheumatoid arthritis (RA) risk. Our analysis revealed 23 significant genes that potentially contribute to ACPA‐related differential methylation, highlighting the relevance of cell signaling and collagen metabolism in RA. We implemented our method in the R Bioconductor package called "SOMNiBUS." [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Expectation-Maximization enables Phylogenetic Dating under a Categorical Rate Model.
- Author
-
Mai, Uyen, Charvel, Eduardo, and Mirarab, Siavash
- Subjects
DISTRIBUTION (Probability theory) ,EXPECTATION-maximization algorithms ,LOGNORMAL distribution ,NONPARAMETRIC statistics ,GAMMA distributions - Abstract
Dating phylogenetic trees to obtain branch lengths in time units is essential for many downstream applications but has remained challenging. Dating requires inferring substitution rates that can change across the tree. While we can assume to have information about a small subset of nodes from the fossil record or sampling times (for fast-evolving organisms), inferring the ages of the other nodes essentially requires extrapolation and interpolation. Assuming a distribution of branch rates, we can formulate dating as a constrained maximum likelihood (ML) estimation problem. While ML dating methods exist, their accuracy degrades in the face of model misspecification, where the assumed parametric statistical distribution of branch rates vastly differs from the true distribution. Notably, most existing methods assume rigid, often unimodal, branch rate distributions. A second challenge is that the likelihood function involves an integral over the continuous domain of the rates, often leading to difficult non-convex optimization problems. To tackle both challenges, we propose a new method called Molecular Dating using Categorical-models (MD-Cat). MD-Cat uses a categorical model of rates inspired by non-parametric statistics and can approximate a large family of models by discretizing the rate distribution into k categories. Under this model, we can use the Expectation-Maximization algorithm to co-estimate rate categories and branch lengths in time units. Our model has fewer assumptions about the true distribution of branch rates than parametric models such as Gamma or LogNormal distribution. Our results on two simulated and real datasets of Angiosperms and HIV and a wide selection of rate distributions show that MD-Cat is often more accurate than the alternatives, especially on datasets with exponential or multimodal rate distributions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Estimating transition intensity rate on interval-censored data using semi-parametric with EM algorithm approach.
- Author
-
Qian, Chen, Srivastava, Deo Kumar, Pan, Jianmin, Hudson, Melissa M., and Rai, Shesh N.
- Subjects
EXPECTATION-maximization algorithms ,CENSORING (Statistics) ,PARAMETRIC modeling ,CHILDHOOD cancer ,CARDIOTOXICITY ,PARAMETER estimation - Abstract
Phase IV clinical trials are designed to monitor long-term side effects of medical treatment. For instance, childhood cancer survivors treated with chest radiation and/or anthracycline are often at risk of developing cardiotoxicity during their adulthood. Often the primary focus of a study could be on estimating the cumulative incidence of a particular outcome of interest such as cardiotoxicity. However, it is challenging to evaluate patients continuously and usually, this information is collected through cross-sectional surveys by following patients longitudinally. This leads to interval-censored data since the exact time of the onset of the toxicity is unknown. Rai et al. computed the transition intensity rate using a parametric model and estimated parameters using maximum likelihood approach in an illness-death model. However, such approach may not be suitable if the underlying parametric assumptions do not hold. This manuscript proposes a semi-parametric model, with a logit relationship for the treatment intensities in two groups, to estimate the transition intensity rates within the context of an illness-death model. The estimation of the parameters is done using an EM algorithm with profile likelihood. Results from the simulation studies suggest that the proposed approach is easy to implement and yields comparable results to the parametric model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. A Penalized Empirical Likelihood Approach for Estimating Population Sizes under the Negative Binomial Regression Model.
- Author
-
Ji, Yulu and Liu, Yang
- Subjects
CHI-square distribution ,ASYMPTOTIC normality ,EXPECTATION-maximization algorithms ,CONFIDENCE intervals ,ASYMPTOTIC distribution ,BINOMIAL distribution - Abstract
In capture–recapture experiments, the presence of overdispersion and heterogeneity necessitates the use of the negative binomial regression model for inferring population sizes. However, within this model, existing methods based on likelihood and ratio regression for estimating the dispersion parameter often face boundary and nonidentifiability issues. These problems can result in nonsensically large point estimates and unbounded upper limits of confidence intervals for the population size. We present a penalized empirical likelihood technique for solving these two problems by imposing a half-normal prior on the population size. Based on the proposed approach, a maximum penalized empirical likelihood estimator with asymptotic normality and a penalized empirical likelihood ratio statistic with asymptotic chi-square distribution are derived. To improve numerical performance, we present an effective expectation-maximization (EM) algorithm. In the M-step, optimization for the model parameters could be achieved by fitting a standard negative binomial regression model via the R basic function glm.nb(). This approach ensures the convergence and reliability of the numerical algorithm. Using simulations, we analyze several synthetic datasets to illustrate three advantages of our methods in finite-sample cases: complete mitigation of the boundary problem, more efficient maximum penalized empirical likelihood estimates, and more precise penalized empirical likelihood ratio interval estimates compared to the estimates obtained without penalty. These advantages are further demonstrated in a case study estimating the abundance of black bears (Ursus americanus) at the U.S. Army's Fort Drum Military Installation in northern New York. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Penalised estimation of partially linear additive zero-inflated Bernoulli regression models.
- Author
-
Lu, Minggen, Li, Chin-Shang, and Wagner, Karla D.
- Subjects
ASYMPTOTIC efficiencies ,ASYMPTOTIC normality ,EXPECTATION-maximization algorithms ,REGRESSION analysis ,SPLINES ,NONPARAMETRIC estimation - Abstract
We develop a practical and computationally efficient penalised estimation approach for partially linear additive models to zero-inflated binary outcome data. To facilitate estimation, B-splines are employed to approximate unknown nonparametric components. A two-stage iterative expectation-maximisation (EM) algorithm is proposed to calculate penalised spline estimates. The large-sample properties such as the uniform convergence and the optimal rate of convergence for functional estimators, and the asymptotic normality and efficiency for regression coefficient estimators are established. Further, two variance-covariance estimation approaches are proposed to provide reliable Wald-type inference for regression coefficients. We conducted an extensive Monte Carlo study to evaluate the numerical properties of the proposed penalised methodology and compare it to the competing spline method [Li and Lu. 'Semiparametric Zero-Inflated Bernoulli Regression with Applications', Journal of Applied Statistics, 49, 2845–2869]. The methodology is further illustrated by an egocentric network study. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. EM algorithm for generalized Ridge regression with spatial covariates.
- Author
-
Obakrim, Said, Ailliot, Pierre, Monbet, Valérie, and Raillard, Nicolas
- Subjects
TOEPLITZ matrices ,EXPECTATION-maximization algorithms ,COVARIANCE matrices ,GAUSSIAN distribution ,MULTICOLLINEARITY ,ALGORITHMS - Abstract
The generalized Ridge penalty is a powerful tool for dealing with multicollinearity and high‐dimensionality in regression problems. The generalized Ridge regression can be derived as the mean of a posterior distribution with a Normal prior and a given covariance matrix. The covariance matrix controls the structure of the coefficients, which depends on the particular application. For example, it is appropriate to assume that the coefficients have a spatial structure when the covariates are spatially correlated. This study proposes an Expectation‐Maximization algorithm for estimating generalized Ridge parameters whose covariance structure depends on specific parameters. We focus on three cases: diagonal (when the covariance matrix is diagonal with constant elements), Matérn, and conditional autoregressive covariances. A simulation study is conducted to evaluate the performance of the proposed method, and then the method is applied to predict ocean wave heights using wind conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Yang Feng and Jiajin Sun's contribution to the Discussion of 'Root and community inference on the latent growth process of a network' by Crane and Xu.
- Author
-
Feng, Yang and Sun, Jiajin
- Subjects
EXPECTATION-maximization algorithms ,SAMPLING (Process) ,TREE size ,TREE growth ,RANDOM forest algorithms ,GIBBS sampling - Published
- 2024
- Full Text
- View/download PDF
35. Research on an Autonomous Localization Method for Trains Based on Pulse Observation in a Tunnel Environment.
- Author
-
Shi, Jianqiang, Zhang, Youpeng, Chen, Guangwu, and Si, Yongbo
- Subjects
EXPECTATION-maximization algorithms ,NOISE control ,KALMAN filtering ,CULVERTS ,TUNNELS - Abstract
China's rail transit system is developing rapidly, but achieving seamless high-precision localization of trains throughout the entire route in closed environments such as tunnels and culverts still faces significant challenges. Traditional localization technologies cannot meet current demands, and the present paper proposes an autonomous localization method for trains based on pulse observation in a tunnel environment. First, the Letts criterion is used to eliminate abnormal gyro data, the CEEMDAN method is employed for signal decomposition, and the decomposed signals are classified using the continuous mean square error and norm method. Noise reduction is performed using forward linear filtering and dynamic threshold filtering, respectively, maximizing the retention of its effective signal components. A SINS/OD integrated localization model is established, and an observation equation is constructed based on velocity matching, resulting in an 18-dimensional complex state space model. Finally, the EM algorithm is used to address Non-Line-Of-Sight and multipath effect errors. The optimized model is then applied in the Kalman filter to better adapt to the system's observation conditions. By dynamically adjusting the noise covariance, the localization system can continue to maintain continuous high-precision position information output in a tunnel environment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Maximum likelihood estimation for semiparametric regression models with interval-censored multistate data.
- Author
-
Gu, Yu, Zeng, Donglin, Heiss, Gerardo, and Lin, D Y
- Subjects
RANDOM effects model ,MAXIMUM likelihood statistics ,REGRESSION analysis ,COVARIANCE matrices ,DISEASE progression ,EXPECTATION-maximization algorithms - Abstract
Interval-censored multistate data arise in many studies of chronic diseases, where the health status of a subject can be characterized by a finite number of disease states and the transition between any two states is only known to occur over a broad time interval. We relate potentially time-dependent covariates to multistate processes through semiparametric proportional intensity models with random effects. We study nonparametric maximum likelihood estimation under general interval censoring and develop a stable expectation-maximization algorithm. We show that the resulting parameter estimators are consistent and that the finite-dimensional components are asymptotically normal with a covariance matrix that attains the semiparametric efficiency bound and can be consistently estimated through profile likelihood. In addition, we demonstrate through extensive simulation studies that the proposed numerical and inferential procedures perform well in realistic settings. Finally, we provide an application to a major epidemiologic cohort study. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Leveraging independence in high-dimensional mixed linear regression.
- Author
-
Wang, Ning, Deng, Kai, Mai, Qing, and Zhang, Xin
- Subjects
CONDITIONED response ,EXPECTATION-maximization algorithms ,LATENT variables ,ENCYCLOPEDIAS & dictionaries ,CELL lines - Abstract
We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample size. Recent advancements in this field have centered on incorporating sparsity-inducing penalties into the expectation-maximization (EM) algorithm, which seeks to maximize the conditional likelihood of the response given the predictors. However, existing procedures often treat predictors as fixed or overlook their inherent variability. In this paper, we leverage the independence between the predictor and the latent indicator variable of mixtures to facilitate efficient computation and also achieve synergistic variable selection across all mixture components. We establish the non-asymptotic convergence rate of the proposed fast group-penalized EM estimator to the true regression parameters. The effectiveness of our method is demonstrated through extensive simulations and an application to the Cancer Cell Line Encyclopedia dataset for the prediction of anticancer drug sensitivity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Factor-augmented transformation models for interval-censored failure time data.
- Author
-
Li, Hongxi, Li, Shuwei, Sun, Liuquan, and Song, Xinyuan
- Subjects
MAXIMUM likelihood statistics ,ALZHEIMER'S disease ,FACTOR analysis ,INTERVAL analysis ,LATENT variables ,MULTICOLLINEARITY ,EXPECTATION-maximization algorithms - Abstract
Interval-censored failure time data frequently arise in various scientific studies where each subject experiences periodical examinations for the occurrence of the failure event of interest, and the failure time is only known to lie in a specific time interval. In addition, collected data may include multiple observed variables with a certain degree of correlation, leading to severe multicollinearity issues. This work proposes a factor-augmented transformation model to analyze interval-censored failure time data while reducing model dimensionality and avoiding multicollinearity elicited by multiple correlated covariates. We provide a joint modeling framework by comprising a factor analysis model to group multiple observed variables into a few latent factors and a class of semiparametric transformation models with the augmented factors to examine their and other covariate effects on the failure event. Furthermore, we propose a nonparametric maximum likelihood estimation approach and develop a computationally stable and reliable expectation-maximization algorithm for its implementation. We establish the asymptotic properties of the proposed estimators and conduct simulation studies to assess the empirical performance of the proposed method. An application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) study is provided. An R package ICTransCFA is also available for practitioners. Data used in preparation of this article were obtained from the ADNI database. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Variational Estimation for Multidimensional Generalized Partial Credit Model.
- Author
-
Cui, Chengyu, Wang, Chun, and Xu, Gongjun
- Subjects
ITEM response theory ,EXPECTATION-maximization algorithms ,MAXIMUM likelihood statistics ,PSYCHOMETRICS ,DATA analysis - Abstract
Multidimensional item response theory (MIRT) models have generated increasing interest in the psychometrics literature. Efficient approaches for estimating MIRT models with dichotomous responses have been developed, but constructing an equally efficient and robust algorithm for polytomous models has received limited attention. To address this gap, this paper presents a novel Gaussian variational estimation algorithm for the multidimensional generalized partial credit model. The proposed algorithm demonstrates both fast and accurate performance, as illustrated through a series of simulation studies and two real data analyses. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Censored autoregressive regression models with Student‐t innovations.
- Author
-
Valeriano, Katherine A. L., Schumacher, Fernanda L., Galarza, Christian E., and Matos, Larissa A.
- Subjects
EXPECTATION-maximization algorithms ,MISSING data (Statistics) ,STATISTICS ,AUTOREGRESSIVE models ,STOCHASTIC approximation - Abstract
Copyright of Canadian Journal of Statistics is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
41. Robust joint modelling of sparsely observed paired functional data.
- Author
-
Zhou, Huiya, Yan, Xiaomeng, and Zhou, Lan
- Subjects
TYPE I supernovae ,MEASUREMENT errors ,EXPECTATION-maximization algorithms ,GAUSSIAN distribution ,LIGHT curves - Abstract
Copyright of Canadian Journal of Statistics is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
42. First-order multivariate integer-valued autoregressive model with multivariate mixture distributions.
- Author
-
Yu, Weiyang and Zheng, Haitao
- Subjects
EXPECTATION-maximization algorithms ,AUTOREGRESSIVE models ,CRIME - Abstract
The univariate integer-valued time series has been extensively studied, but literature on multivariate integer-valued time series models is quite limited and the complex correlation structure among the multivariate integer-valued time series is barely discussed. In this study, we proposed a first-order multivariate integer-valued autoregressive model to characterize the correlation among multivariate integer-valued time series with higher flexibility. Under the general conditions, we established the stationarity and ergodicity of the proposed model. With the proposed method, we discussed the models with multivariate Poisson-lognormal distribution and multivariate geometric-logitnormal distribution and the corresponding properties. The estimation method based on EM algorithm was developed for the model parameters and extensive simulation studies were performed to evaluate the effectiveness of proposed estimation method. Finally, a real crime dataset was analysed to demonstrate the advantage of the proposed model with comparison to the other models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. A two‐step estimation procedure for semiparametric mixture cure models.
- Author
-
Musta, Eni, Patilea, Valentin, and Van Keilegom, Ingrid
- Subjects
MAXIMUM likelihood statistics ,EXPECTATION-maximization algorithms ,SURVIVAL analysis (Biometry) ,PARAMETRIC modeling ,SAMPLE size (Statistics) - Abstract
In survival analysis, cure models have been developed to account for the presence of cured subjects that will never experience the event of interest. Mixture cure models with a parametric model for the incidence and a semiparametric model for the survival of the susceptibles are particularly common in practice. Because of the latent cure status, maximum likelihood estimation is performed via the iterative EM algorithm. Here, we focus on the cure probabilities and propose a two‐step procedure to improve upon the maximum likelihood estimator when the sample size is not large. The new method is based on presmoothing by first constructing a nonparametric estimator and then projecting it on the desired parametric class. We investigate the theoretical properties of the resulting estimator and show through an extensive simulation study for the logistic‐Cox model that it outperforms the existing method. Practical use of the method is illustrated through two melanoma datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Estimating the prevalence of osteoporosis using ranked-based methodologies and Manitoba's population-based BMD registry.
- Author
-
Omidvar, Sedigheh, Jafari Jozani, Mohammad, Nematollahi, Nader, and Leslie, Wiliam D.
- Subjects
METABOLIC bone disorders ,BONE density ,OSTEOPOROSIS in women ,EXPECTATION-maximization algorithms ,OSTEOPOROSIS - Abstract
Osteoporosis is a metabolic bone disorder that is characterized by reduced bone mineral density (BMD) and deterioration of bone microarchitecture. Osteoporosis is highly prevalent among women over 50, leading to skeletal fragility and risk of fracture. Early diagnosis and treatment of those at high risk for fracture is very important in order to avoid morbidity, mortality and economic burden from preventable fractures. The province of Manitoba established a BMD testing program in 1997. The Manitoba BMD registry is now the largest population-based BMD registry in the world, and has detailed information on fracture outcomes and other covariates for over 160,000 BMD assessments. In this paper, we develop a number of methodologies based on ranked-set type sampling designs to estimate the prevalence of osteoporosis among women of age 50 and older in the province of Manitoba. We use a parametric approach based on finite mixture models, as well as the usual approaches using simple random and stratified sampling designs. Results are obtained under perfect and imperfect ranking scenarios while the sampling and ranking costs are incorporated into the study. We observe that rank-based methodologies can be used as cost-efficient methods to monitor the prevalence of osteoporosis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. The spike-and-slab lasso and scalable algorithm to accommodate multinomial outcomes in variable selection problems.
- Author
-
Leach, Justin M., Yi, Nengjun, and Aban, Inmaculada
- Subjects
DISTRIBUTION (Probability theory) ,PARAMETER estimation ,GENERALIZATION ,ALGORITHMS ,MIXTURES ,EXPECTATION-maximization algorithms - Abstract
Spike-and-slab prior distributions are used to impose variable selection in Bayesian regression-style problems with many possible predictors. These priors are a mixture of two zero-centered distributions with differing variances, resulting in different shrinkage levels on parameter estimates based on whether they are relevant to the outcome. The spike-and-slab lasso assigns mixtures of double exponential distributions as priors for the parameters. This framework was initially developed for linear models, later developed for generalized linear models, and shown to perform well in scenarios requiring sparse solutions. Standard formulations of generalized linear models cannot immediately accommodate categorical outcomes with > 2 categories, i.e. multinomial outcomes, and require modifications to model specification and parameter estimation. Such modifications are relatively straightforward in a Classical setting but require additional theoretical and computational considerations in Bayesian settings, which can depend on the choice of prior distributions for the parameters of interest. While previous developments of the spike-and-slab lasso focused on continuous, count, and/or binary outcomes, we generalize the spike-and-slab lasso to accommodate multinomial outcomes, developing both the theoretical basis for the model and an expectation-maximization algorithm to fit the model. To our knowledge, this is the first generalization of the spike-and-slab lasso to allow for multinomial outcomes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Seismic Evaluation Based on Poisson Hidden Markov Models—The Case of Central and South America.
- Author
-
Georgakopoulou, Evangelia, Tsapanos, Theodoros M., Makrides, Andreas, Scordilis, Emmanuel, Karagrigoriou, Alex, Papadopoulou, Alexandra, and Karastathis, Vassilios
- Subjects
HIDDEN Markov models ,EXPECTATION-maximization algorithms ,EARTHQUAKES ,MARKOV processes ,STOCHASTIC processes - Abstract
A study of earthquake seismicity is undertaken over the areas of Central and South America, the tectonics of which are of great interest. The whole territory is divided into 10 seismic zones based on some seismotectonic characteristics, as in previously published studies. The earthquakes used in the present study are extracted from the catalogs of the International Seismological Center, cover the period of 1900–2021, and are restricted to shallow depths (≤60 km) and a magnitude M ≥ 4.5 . Fore- and aftershocks are removed according to Reasenberg's technique. The paper confines itself to the evaluation of earthquake occurrence probabilities in the seismic zones covering parts of Central and South America, and we implement the hidden Markov model (HMM) and apply the EM algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Group Network Hawkes Process.
- Author
-
Fang, Guanhua, Xu, Ganggang, Xu, Haochen, Zhu, Xuening, and Guan, Yongtao
- Subjects
EXPECTATION-maximization algorithms ,DATA analysis - Abstract
In this work, we study the event occurrences of individuals interacting in a network. To characterize the dynamic interactions among the individuals, we propose a group network Hawkes process (GNHP) model whose network structure is observed and fixed. In particular, we introduce a latent group structure among individuals to account for the heterogeneous user-specific characteristics. A maximum likelihood approach is proposed to simultaneously cluster individuals in the network and estimate model parameters. A fast EM algorithm is subsequently developed by using the branching representation of the proposed GNHP model. Theoretical properties of the resulting estimators of group memberships and model parameters are investigated under both settings when the number of latent groups G is over-specified or correctly specified. A data-driven criterion that can consistently identify the true G under mild conditions is derived. Extensive simulation studies and an application to a dataset collected from Sina Weibo are used to illustrate the effectiveness of the proposed methodology. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Estimation in a general mixture of Markov jump processes.
- Author
-
Frydman, Halina and Surya, Budhi Arta
- Subjects
MARKOV processes ,STOCHASTIC matrices ,JUMP processes ,FISHER information ,EXPECTATION-maximization algorithms - Abstract
Copyright of Canadian Journal of Statistics is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
49. Generalised shot-noise representations of stochastic systems driven by non-Gaussian Lévy processes.
- Author
-
Godsill, Simon, Kontoyiannis, Ioannis, and Tapia Costa, Marcos
- Subjects
MARKOV chain Monte Carlo ,LEVY processes ,STOCHASTIC systems ,STOCHASTIC differential equations ,LIFE sciences ,LATENT variables ,EXPECTATION-maximization algorithms - Abstract
We consider the problem of obtaining effective representations for the solutions of linear, vector-valued stochastic differential equations (SDEs) driven by non-Gaussian pure-jump Lévy processes, and we show how such representations lead to efficient simulation methods. The processes considered constitute a broad class of models that find application across the physical and biological sciences, mathematics, finance, and engineering. Motivated by important relevant problems in statistical inference, we derive new, generalised shot-noise simulation methods whenever a normal variance-mean (NVM) mixture representation exists for the driving Lévy process, including the generalised hyperbolic, normal-gamma, and normal tempered stable cases. Simple, explicit conditions are identified for the convergence of the residual of a truncated shot-noise representation to a Brownian motion in the case of the pure Lévy process, and to a Brownian-driven SDE in the case of the Lévy-driven SDE. These results provide Gaussian approximations to the small jumps of the process under the NVM representation. The resulting representations are of particular importance in state inference and parameter estimation for Lévy-driven SDE models, since the resulting conditionally Gaussian structures can be readily incorporated into latent variable inference methods such as Markov chain Monte Carlo, expectation-maximisation, and sequential Monte Carlo. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Identification of switched gated recurrent unit neural networks with a generalized Gaussian distribution.
- Author
-
Bai, Wentao, Guo, Fan, Gu, Suhang, Yan, Chao, Jiang, Chunli, and Zhang, Haoyu
- Subjects
HYBRID systems ,RECURRENT neural networks ,NONLINEAR dynamical systems ,EXPECTATION-maximization algorithms ,PROBLEM solving - Abstract
Due to the limitations of the model itself, the performance of switched autoregressive exogenous (SARX) models will face potential threats when modeling nonlinear hybrid dynamic systems. To address this problem, a robust identification approach of the switched gated recurrent unit (SGRU) model is developed in this paper. Firstly, all submodels of the SARX model are replaced by gated recurrent unit neural networks. The obtained SGRU model has stronger nonlinear fitting ability than the SARX model. Secondly, this paper departs from the conventional Gaussian distribution assumption for noise, opting instead for a generalized Gaussian distribution. This enables the proposed model to achieve stable prediction performance under the influence of different noises. Notably, no prior assumptions are imposed on the knowledge of operating modes in the proposed switched model. Therefore, the EM algorithm is used to solve the problem of parameter estimation with hidden variables in this paper. Finally, two simulation experiments are performed. By comparing the nonlinear fitting ability of the SGRU model with the SARX model and the prediction performance of the SGRU model under different noise distributions, the effectiveness of the proposed approach is verified. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.