87 results on '"Grace Y. Yi"'
Search Results
2. Regularized matrix-variate logistic regression with response subject to misclassification
- Author
-
Junhan Fang and Grace Y. Yi
- Subjects
Statistics and Probability ,Binary response ,Applied Mathematics ,05 social sciences ,Inference ,Logistic regression ,01 natural sciences ,Statistics::Machine Learning ,010104 statistics & probability ,Matrix (mathematics) ,Random variate ,0502 economics and business ,Covariate ,Statistics ,Medical imaging ,Statistics::Methodology ,050207 economics ,0101 mathematics ,Statistics, Probability and Uncertainty ,Scad ,Mathematics - Abstract
Matrix-variate logistic regression is useful in facilitating the relationship between the binary response and complex-featured matrix-variates arising commonly from medical imaging research. However, standard inference procedures based on such a model are impaired by the presence of the response misclassification as well as inactive covariates. It is imperative to account for the misclassification effects and select active covariates when employing matrix-variate logistic regression to analyze such data. In this paper, we develop penalized unbiased estimating functions using the smoothly clipped absolute deviation (SCAD) penalty to address the sparsity of matrix-variate data as well as the response misclassification effects. The proposed methods are justified both theoretically and numerically. We analyze the Breast Cancer Wisconsin data with the proposed methods.
- Published
- 2022
3. A conversation with Nancy Reid
- Author
-
Radu V. Craiu and Grace Y. Yi
- Subjects
Statistics and Probability ,Statistics, Probability and Uncertainty - Published
- 2022
4. Is 14-Days a Sensible Quarantine Length for COVID-19? Examinations of Some Associated Issues with a Case Study of COVID-19 Incubation Times
- Author
-
Grace Y. Yi, Yuan Bian, Yasin Khadem Charvadeh, and Wenqing He
- Subjects
Statistics and Probability ,Coronavirus disease 2019 (COVID-19) ,Distribution (economics) ,Biochemistry, Genetics and Molecular Biology (miscellaneous) ,Article ,Incubation period ,law.invention ,Quantile estimation ,03 medical and health sciences ,0302 clinical medicine ,law ,Quarantine ,Pandemic ,030212 general & internal medicine ,Incubation ,030304 developmental biology ,Profile likelihood ,0303 health sciences ,Actuarial science ,Incubation times ,business.industry ,COVID-19 ,3. Good health ,Geography ,Infectious disease (medical specialty) ,Quarantine time ,Biostatistics ,business - Abstract
To confine the spread of an infectious disease, setting a sensible quarantine time is crucial. To this end, it is imperative to well understand the distribution of incubation times of the disease. Regarding the ongoing COVID-19 pandemic, 14-days is commonly taken as a quarantine time to curb the virus spread in balancing the impacts of COVID-19 on diverse aspects of the society, including public health, economy, and humanity perspectives, etc. However, setting a sensible quarantine time is not trivial and it depends on various underlying factors. In this article, we take an angle of examining the distribution of the COVID-19 incubation time using likelihood-based methods. Our study is carried out on a dataset of 178 COVID-19 cases dated from January 20, 2020 to February 29, 2020, with the information of exposure periods and dates of symptom onset collected. To gain a good understanding of possible scenarios, we employ different models to describe incubation times of COVID-19. Our findings suggest that statistically, the 14-day quarantine time may not be long enough to control the probability of an early release of infected individuals to be small. While the size of the study data is not large enough to offer us a definitely acceptable quarantine time, and further in practice, the decision-makers may take account of other factors related to social and economic concerns to set up a practically acceptable quarantine time, our study demonstrates useful methods to determine a reasonable quarantine time from a statistical standpoint. Further, it reveals some associated complexity for fully understanding the COVID-19 incubation time distribution. Supplementary Information The online version contains supplementary material available at 10.1007/s12561-021-09320-8.
- Published
- 2021
5. Polya tree Monte Carlo method
- Author
-
Haoxin Zhuang, Liqun Diao, and Grace Y. Yi
- Subjects
Statistics and Probability ,Computational Mathematics ,Computational Theory and Mathematics ,Applied Mathematics - Published
- 2023
6. Imputation and likelihood methods for matrix‐variate logistic regression with response misclassification
- Author
-
Grace Y. Yi and Junhan Fang
- Subjects
Statistics and Probability ,0303 health sciences ,Logistic regression ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,Matrix (mathematics) ,Random variate ,Statistics ,Imputation (statistics) ,0101 mathematics ,Statistics, Probability and Uncertainty ,030304 developmental biology ,Mathematics - Published
- 2021
7. Special Issue on Neuroimaging data analysis: Guest Editors' Introduction
- Author
-
Farouk S. Nathoo, Linglong Kong, and Grace Y. Yi
- Subjects
Statistics and Probability ,Cognitive science ,Neuroimaging ,Statistics, Probability and Uncertainty ,Mathematics - Published
- 2021
8. Estimation and hypothesis testing with error‐contaminated survival data under possibly misspecified measurement error models
- Author
-
Ying Yan and Grace Y. Yi
- Subjects
Statistics and Probability ,Estimation ,Proportional hazards model ,Estimation theory ,05 social sciences ,01 natural sciences ,010104 statistics & probability ,Survival data ,0502 economics and business ,Statistics ,Errors-in-variables models ,0101 mathematics ,Statistics, Probability and Uncertainty ,050205 econometrics ,Statistical hypothesis testing ,Mathematics - Published
- 2021
9. De-noising analysis of noisy data under mixed graphical models
- Author
-
Li-Pang Chen and Grace Y. Yi
- Subjects
Statistics and Probability ,Statistics, Probability and Uncertainty - Published
- 2022
10. Sufficient dimension reduction for survival data analysis with error-prone variables
- Author
-
Li-Pang Chen and Grace Y. Yi
- Subjects
Statistics and Probability ,Statistics, Probability and Uncertainty - Published
- 2022
11. Variable selection for proportional hazards models with high‐dimensional covariates subject to measurement error
- Author
-
Ao Yuan, Grace Y. Yi, and Baojiang Chen
- Subjects
Statistics and Probability ,Survival data ,Observational error ,Proportional hazards model ,Covariate ,Statistics ,Feature selection ,Subject (documents) ,High dimensional ,Statistics, Probability and Uncertainty ,Mathematics - Published
- 2020
12. Matrix-variate logistic regression with measurement error
- Author
-
Grace Y. Yi and Junhan Fang
- Subjects
Statistics and Probability ,0303 health sciences ,Observational error ,Applied Mathematics ,General Mathematics ,Logistic regression ,01 natural sciences ,Agricultural and Biological Sciences (miscellaneous) ,010104 statistics & probability ,03 medical and health sciences ,Matrix (mathematics) ,Random variate ,Applied mathematics ,0101 mathematics ,Statistics, Probability and Uncertainty ,General Agricultural and Biological Sciences ,030304 developmental biology ,Mathematics - Abstract
Summary Measurement error in covariates has been extensively studied in many conventional regression settings where covariate information is typically expressed in a vector form. However, there has been little work on error-prone matrix-variate data, which commonly arise from studies with imaging, spatial-temporal structures, etc. We consider analysis of error-contaminated matrix-variate data. We particularly focus on matrix-variate logistic measurement error models. We examine the biases induced from naive analysis which ignores measurement error in matrix-variate data. Two measurement error correction methods are developed to adjust for measurement error effects. The proposed methods are justified both theoretically and empirically. We analyse an electroencephalography dataset with the proposed methods.
- Published
- 2020
13. Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error
- Author
-
Grace Y. Yi and Li-Pang Chen
- Subjects
Statistics and Probability ,Observational error ,Data collection ,Proportional hazards model ,05 social sciences ,Inference ,Estimator ,01 natural sciences ,010104 statistics & probability ,Robustness (computer science) ,0502 economics and business ,Covariate ,Statistics ,0101 mathematics ,Survival analysis ,050205 econometrics ,Mathematics - Abstract
Many methods have been developed for analyzing survival data which are commonly right-censored. These methods, however, are challenged by complex features pertinent to the data collection as well as the nature of data themselves. Typically, biased samples caused by left-truncation (or length-biased sampling) and measurement error often accompany survival analysis. While such data frequently arise in practice, little work has been available to simultaneously address these features. In this paper, we explore valid inference methods for handling left-truncated and right-censored survival data with measurement error under the widely used Cox model. We first exploit a flexible estimator for the survival model parameters which does not require specification of the baseline hazard function. To improve the efficiency, we further develop an augmented nonparametric maximum likelihood estimator. We establish asymptotic results and examine the efficiency and robustness issues for the proposed estimators. The proposed methods enjoy appealing features that the distributions of the covariates and of the truncation times are left unspecified. Numerical studies are reported to assess the finite sample performance of the proposed methods.
- Published
- 2020
14. Causal inference with noisy data: Bias analysis and estimation approaches to simultaneously addressing missingness and misclassification in binary outcomes
- Author
-
Di Shu and Grace Y. Yi
- Subjects
Statistics and Probability ,Estimation ,Models, Statistical ,Epidemiology ,Computer science ,Average treatment effect ,Inverse probability weighting ,Estimator ,Binary number ,Models, Theoretical ,Missing data ,01 natural sciences ,Outcome (probability) ,Causality ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Bias ,Causal inference ,Econometrics ,Humans ,Computer Simulation ,030212 general & internal medicine ,0101 mathematics - Abstract
Causal inference has been widely conducted in various fields and many methods have been proposed for different settings. However, for noisy data with both mismeasurements and missing observations, those methods often break down. In this paper, we consider a problem that binary outcomes are subject to both missingness and misclassification, when the interest is in estimation of the average treatment effects (ATE). We examine the asymptotic biases caused by ignoring missingness and/or misclassification and establish the intrinsic connections between missingness effects and misclassification effects on the estimation of ATE. We develop valid weighted estimation methods to simultaneously correct for missingness and misclassification effects. To provide protection against model misspecification, we further propose a doubly robust correction method which yields consistent estimators when either the treatment model or the outcome model is misspecified. Simulation studies are conducted to assess the performance of the proposed methods. An application to smoking cessation data is reported to illustrate the use of the proposed methods.
- Published
- 2019
15. Variable selection via the composite likelihood method for multilevel longitudinal data with missing responses and covariates
- Author
-
Haocheng Li, Wenqing He, Di Shu, and Grace Y. Yi
- Subjects
Statistics and Probability ,Quasi-maximum likelihood ,Computer science ,Longitudinal data ,Applied Mathematics ,Smoking prevention ,Computation ,05 social sciences ,Feature selection ,Missing data ,01 natural sciences ,010104 statistics & probability ,Computational Mathematics ,Computational Theory and Mathematics ,0502 economics and business ,Statistics ,Covariate ,0101 mathematics ,050205 econometrics ,Missing not at random - Abstract
Longitudinal data with multilevel structures are commonly collected when following up subjects in clusters over a period of time. Missing values and variable selection issues are common for such data. Biased results may be produced if incompleteness of data is ignored in the analysis. On the other hand, incorporating a large number of irrelevant covariates into inferential procedures may lead to difficulty in computation and interpretation. A unified penalized composite likelihood framework is developed to handle data with missingness and variable selection issues. It is flexible to handle the situation where responses and covariates are missing not simultaneously under the assumption of missing not at random. The method is justified both rigorously with theoretical results and numerically with simulation studies. The method is also applied to the Waterloo Smoking Prevention Project data.
- Published
- 2019
16. R package for analysis of data with mixed measurement error and misclassification in covariates: augSIMEX
- Author
-
Grace Y. Yi and Qihuang Zhang
- Subjects
Statistics and Probability ,Generalized linear model ,021103 operations research ,Data collection ,Observational error ,Applied Mathematics ,0211 other engineering and technologies ,Inference ,02 engineering and technology ,01 natural sciences ,010104 statistics & probability ,R package ,Modeling and Simulation ,Statistics ,Covariate ,Data analysis ,0101 mathematics ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
Measurement error and misclassification arise commonly in various data collection processes. It is well-known that ignoring these features in the data analysis usually leads to biased inference. With the generalized linear model setting, Yi et al. [Functional and structural methods with mixed measurement error and misclassification in covariates. J Am Stat Assoc. 2015;110:681–696] developed inference methods to adjust for the effects of measurement error in continuous covariates and misclassification in discrete covariates simultaneously for the scenario where validation data are available. The augmented simulation-extrapolation (SIMEX) approach they developed generalizes the usual SIMEX method which is only applicable to handle continuous error-prone covariates. To implement this method, we develop an R package, augSIMEX, for public use. Simulation studies are conducted to illustrate the use of the algorithm. This package is available at CRAN.
- Published
- 2019
17. Analysis of panel data with misclassified covariates
- Author
-
Grace Y. Yi, Wenqing He, and Feng He
- Subjects
Statistics and Probability ,Computer science ,Applied Mathematics ,Covariate ,Econometrics ,Identifiability ,Markov model ,Panel data - Published
- 2019
18. Marginal analysis of bivariate mixed responses with measurement error and misclassification
- Author
-
Qihuang Zhang and Grace Y. Yi
- Subjects
Statistics and Probability ,0303 health sciences ,Observational error ,Models, Statistical ,Epidemiology ,Cost-Benefit Analysis ,Binary number ,Bivariate analysis ,01 natural sciences ,Causality ,010104 statistics & probability ,03 medical and health sciences ,Mice ,Marginal Analysis ,Health Information Management ,Bias ,Statistics ,Animals ,Computer Simulation ,0101 mathematics ,Generalized estimating equation ,030304 developmental biology ,Mathematics ,Genome-Wide Association Study - Abstract
Bivariate responses with mixed continuous and binary variables arise commonly in applications such as clinical trials and genetic studies. Statistical methods based on jointly modeling continuous and binary variables have been available. However, such methods ignore the effects of response mismeasurement, a ubiquitous feature in applications. It has been well studied that in many settings, ignorance of mismeasurement in variables usually results in biased estimation. In this paper, we consider the setting with a bivariate outcome vector which contains a continuous component and a binary component both subject to mismeasurement. We propose estimating equation approaches to handle measurement error in the continuous response and misclassification in the binary response simultaneously. The proposed estimators are consistent and robust to certain model misspecification, provided regularity conditions. Extensive simulation studies confirm that the proposed methods successfully correct the biases resulting from the error-in-variables under various settings. The proposed methods are applied to analyze the outbred Carworth Farms White mice data arising from a genome-wide association study.
- Published
- 2021
19. Feature screening with large-scale and high-dimensional survival data
- Author
-
Wenqing He, Grace Y. Yi, and Raymond J. Carroll
- Subjects
Statistics and Probability ,Computer science ,media_common.quotation_subject ,Big data ,computer.software_genre ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,03 medical and health sciences ,Covariate ,Feature (machine learning) ,Quality (business) ,0101 mathematics ,Dimension (data warehouse) ,030304 developmental biology ,media_common ,Proportional Hazards Models ,0303 health sciences ,Genome ,General Immunology and Microbiology ,business.industry ,Applied Mathematics ,Scale (chemistry) ,General Medicine ,Genomics ,Regression ,Sample size determination ,Sample Size ,Data mining ,General Agricultural and Biological Sciences ,business ,computer - Abstract
Data with a huge size present great challenges in modeling, inferences, and computation. In handling big data, much attention has been directed to settings with "large p small n", and relatively less work has been done to address problems with p and n being both large, though data with such a feature have now become more accessible than before, where p represents the number of variables and n stands for the sample size. The big volume of data does not automatically ensure good quality of inferences because a large number of unimportant variables may be collected in the process of gathering informative variables. To carry out valid statistical analysis, it is imperative to screen out noisy variables that have no predictive value for explaining the outcome variable. In this paper, we develop a screening method for handling large-sized survival data, where the sample size n is large and the dimension p of covariates is of non-polynomial order of the sample size n, or the so-called NP-dimension. We rigorously establish theoretical results for the proposed method and conduct numerical studies to assess its performance. Our research offers multiple extensions of existing work and enlarges the scope of high-dimensional data analysis. The proposed method capitalizes on the connections among useful regression settings and offers a computationally efficient screening procedure. Our method can be applied to different situations with large-scale data including genomic data.
- Published
- 2021
20. Ensembling Imbalanced-Spatial-Structured Support Vector Machine
- Author
-
Xin Liu, Wenqing He, Grace Y. Yi, and Glenn Bauman
- Subjects
Statistics and Probability ,Economics and Econometrics ,Spatial correlation ,Computer science ,Machine learning ,computer.software_genre ,01 natural sciences ,Imaging data ,Imbalanced data ,010104 statistics & probability ,spatial correlation ,imaging data ,0502 economics and business ,0101 mathematics ,Special case ,050205 econometrics ,Structured support vector machine ,business.industry ,05 social sciences ,ensemble ,Classification ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,local consistency ,imbalanced data ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,computer ,Classifier (UML) - Abstract
The support vector machine (SVM) and its extensions have been widely used in various areas. However, these methods cannot effectively handle imbalanced data with spatial association. The ensembling imbalanced-spatial-structured support vector machine (EISS-SVM) method is proposed to handle such data. Not only the proposed method accommodates the relationship between the response and predictors, but also accounts for the spatial correlation existing in data which may be imbalanced. The EISS-SVM classifier embraces the usual SVM as a special case. Numerical studies show satisfactory performance of the proposed method, and the analysis results are reported for the application of the proposed method to handling the imaging data from an ongoing prostate cancer research conducted in Canada.
- Published
- 2021
21. Characterizing the COVID-19 dynamics with a new epidemic model: Susceptible-exposed-asymptomatic-symptomatic-active-removed
- Author
-
Grace Y. Yi, Pingbo Hu, and Wenqing He
- Subjects
Statistics and Probability ,Statistics, Probability and Uncertainty - Abstract
The coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has spread stealthily and presented a tremendous threat to the public. It is important to investigate the transmission dynamics of COVID-19 to help understand the impact of the disease on public health and the economy. In this article, we develop a new epidemic model that utilizes a set of ordinary differential equations with unknown parameters to delineate the transmission process of COVID-19. The model accounts for asymptomatic infections as well as the lag between symptom onset and the confirmation date of infection. To reflect the transmission potential of an infected case, we derive theLa maladie à coronavirus 2019 (COVID‐19), causée par le coronavirus 2 du syndrome respiratoire aigu sévère (SARS‐CoV‐2), s'est rapidement propagée et représente une grande menace pour le public. Pour mieux comprendre l'impact de cette maladie sur la santé publique et l'économie, il est important d'étudier la dynamique de sa transmission. A cette fin, les auteurs de cet article proposent un nouveau modèle épidémiologique basé sur un ensemble d'équations différentielles ordinaires avec des paramètres inconnus et qui tient compte des infections asymptomatiques ainsi que du décalage entre l'apparition des symptômes et la date de confirmation de l'infection. Ils en déduisent le
- Published
- 2020
22. Analysis of noisy survival data with graphical proportional hazards measurement error models
- Author
-
Li-Pang Chen and Grace Y. Yi
- Subjects
Statistics and Probability ,Flexibility (engineering) ,Models, Statistical ,General Immunology and Microbiology ,Proportional hazards model ,Computer science ,Applied Mathematics ,05 social sciences ,Inference ,General Medicine ,01 natural sciences ,Survival Analysis ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,0502 economics and business ,Covariate ,Statistics ,Feature (machine learning) ,Errors-in-variables models ,Graphical model ,0101 mathematics ,General Agricultural and Biological Sciences ,Survival analysis ,050205 econometrics ,Proportional Hazards Models - Abstract
In survival data analysis, the Cox proportional hazards (PH) model is perhaps the most widely used model to feature the dependence of survival times on covariates. While many inference methods have been developed under such a model or its variants, those models are not adequate for handling data with complex structured covariates. High-dimensional survival data often entail several features: (1) many covariates are inactive in explaining the survival information, (2) active covariates are associated in a network structure, and (3) some covariates are error-contaminated. To hand such kinds of survival data, we propose graphical PH measurement error models and develop inferential procedures for the parameters of interest. Our proposed models significantly enlarge the scope of the usual Cox PH model and have great flexibility in characterizing survival data. Theoretical results are established to justify the proposed methods. Numerical studies are conducted to assess the performance of the proposed methods.
- Published
- 2020
23. A Bayesian hierarchical copula model
- Author
-
Liqun Diao, Haoxin Zhuang, and Grace Y. Yi
- Subjects
Statistics and Probability ,Sampling scheme ,MCMC ,dependence modeling ,Computation ,05 social sciences ,Bayesian probability ,Estimator ,Markov chain Monte Carlo ,01 natural sciences ,Copula (probability theory) ,010104 statistics & probability ,symbols.namesake ,0502 economics and business ,symbols ,Bayesian hierarchical modeling ,copula ,0101 mathematics ,Statistics, Probability and Uncertainty ,Algorithm ,Bayesian hierarchical model ,050205 econometrics ,Mathematics - Abstract
Dependent data with hierarchical structures arises commonly from a variety of application, and analysis of such data is often challenging due to the complexity in modeling dependence structures and the computation intensity. In this paper, we propose a Bayesian hierarchical copula model (BHCM) to accommodate hierarchical structures of dependent data, where the subject-level dependence is modeled by the copula-based model and the hierarchical structure is described using random dependence parameters. We introduce a layer-by-layer sampling scheme for conducting Bayesian inferences. Our proposed BHCM enjoys the flexibility of modeling various complex association structures, while retaining manageable computation. Extensive simulation studies show that our proposed estimators outperform conventional likelihood-based estimators in a variety of finite sample settings. We apply the BHCM to analyze the Vertebral Column dataset arising from UCI Machine Learning Repository.
- Published
- 2020
24. Model selection and model averaging for analysis of truncated and censored data with measurement error
- Author
-
Li-Pang Chen and Grace Y. Yi
- Subjects
Statistics and Probability ,Scheme (programming language) ,model selection ,computer.software_genre ,Focus information criterion ,01 natural sciences ,survival analysis ,010104 statistics & probability ,0502 economics and business ,Covariate ,Statistical inference ,0101 mathematics ,left-truncation ,050205 econometrics ,computer.programming_language ,Mathematics ,Observational error ,Model selection ,05 social sciences ,Estimator ,model averaging ,Left truncation ,Data mining ,Statistics, Probability and Uncertainty ,Focus (optics) ,computer ,measurement error - Abstract
Model selection plays a critical role in statistical inference and a large literature has been devoted to this topic. Despite extensive research attention on model selection, research gaps still remain. An important but relatively unexplored problem concerns truncated and censored data with measurement error. Although analysis of left-truncated and right-censored (LTRC) data has received extensive interests in survival analysis, there has been no research on model selection for LTRC data with measurement error. In this paper, we take up this important problem and develop inferential procedures to handle model selection for LTRC data with measurement error in covariates. Our development employs the local model misspecification framework ([6]; [10]) and emphasizes the use of the focus information criterion (FIC). We develop valid estimators using the model averaging scheme and establish theoretical results to justify the validity of our methods. Numerical studies are conducted to assess the performance of the proposed methods.
- Published
- 2020
25. Estimation of Causal Effect Measures in the Presence of Measurement Error in Confounders
- Author
-
Grace Y. Yi and Di Shu
- Subjects
Statistics and Probability ,Estimation ,Confounding ,Absolute risk reduction ,Estimator ,Context (language use) ,Odds ratio ,01 natural sciences ,Biochemistry, Genetics and Molecular Biology (miscellaneous) ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Causal inference ,Relative risk ,Statistics ,030212 general & internal medicine ,0101 mathematics ,Mathematics - Abstract
The odds ratio, risk ratio, and the risk difference are important measures for assessing comparative effectiveness of available treatment plans in epidemiological studies. Estimation of these measures, however, is often challenged by the presence of error-contaminated confounders. In this article, by adapting two correction methods for measurement error effects applicable to the noncausal context, we propose valid methods which consistently estimate the causal odds ratio, causal risk ratio, and the causal risk difference for settings with error-prone confounders. Furthermore, we develop a bootstrap-based procedure to construct estimators with improved asymptotic efficiency. Numerical studies are conducted to assess the performance of the proposed methods.
- Published
- 2018
26. Simultaneous variable selection and estimation for multivariate multilevel longitudinal data with both continuous and binary responses
- Author
-
Di Shu, Yukun Zhang, Haocheng Li, and Grace Y. Yi
- Subjects
Statistics and Probability ,Estimation ,Mixed model ,Multivariate statistics ,Estimation theory ,Computer science ,Applied Mathematics ,Feature selection ,02 engineering and technology ,computer.software_genre ,Random effects model ,01 natural sciences ,010104 statistics & probability ,Computational Mathematics ,Computational Theory and Mathematics ,Covariate ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,0101 mathematics ,computer ,Curse of dimensionality - Abstract
Complex structured data settings are studied where outcomes are multivariate and multilevel and are collected longitudinally. Multivariate outcomes include both continuous and discrete responses. In addition, the data contain a large number of covariates but only some of them are important in explaining the dynamic features of the responses. To delineate the complex association structures of the responses, a model with correlated random effects is proposed. To handle the large dimensionality of covariates, a simultaneous variable selection and parameter estimation method is developed. To implement the method, a computationally feasible algorithm is described. The proposed method is evaluated empirically by simulation studies and illustrated by analyzing the data arising from the Waterloo Smoking Prevention Project.
- Published
- 2018
27. Causal inference with measurement error in outcomes: Bias analysis and estimation methods
- Author
-
Di Shu and Grace Y. Yi
- Subjects
Statistics and Probability ,Epidemiology ,Average treatment effect ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Bias ,Health Information Management ,Consistent estimator ,Statistics ,Humans ,Computer Simulation ,030212 general & internal medicine ,0101 mathematics ,Probability ,Mathematics ,Models, Statistical ,Observational error ,Inverse probability weighting ,Estimator ,Outcome (probability) ,Research Design ,Causal inference ,Errors-in-variables models ,Smoking Cessation - Abstract
Inverse probability weighting estimation has been popularly used to consistently estimate the average treatment effect. Its validity, however, is challenged by the presence of error-prone variables. In this paper, we explore the inverse probability weighting estimation with mismeasured outcome variables. We study the impact of measurement error for both continuous and discrete outcome variables and reveal interesting consequences of the naive analysis which ignores measurement error. When a continuous outcome variable is mismeasured under an additive measurement error model, the naive analysis may still yield a consistent estimator; when the outcome is binary, we derive the asymptotic bias in a closed-form. Furthermore, we develop consistent estimation procedures for practical scenarios where either validation data or replicates are available. With validation data, we propose an efficient method for estimation of average treatment effect; the efficiency gain is substantial relative to usual methods of using validation data. To provide protection against model misspecification, we further propose a doubly robust estimator which is consistent even when either the treatment model or the outcome model is misspecified. Simulation studies are reported to assess the performance of the proposed methods. An application to a smoking cessation dataset is presented.
- Published
- 2017
28. Analysis of case-control data with interacting misclassified covariates
- Author
-
Wenqing He and Grace Y. Yi
- Subjects
Statistics and Probability ,Misclassification ,Computer science ,Prospective logistic regression ,Inference ,Case-control study ,Logistic regression ,Health outcomes ,01 natural sciences ,3. Good health ,Computer Science Applications ,Replicated measurements ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Interaction term ,Covariate ,Econometrics ,Feature (machine learning) ,030212 general & internal medicine ,0101 mathematics ,Statistics, Probability and Uncertainty ,lcsh:Probabilities. Mathematical statistics ,Case control data ,lcsh:QA273-280 - Abstract
Case-control studies are important and useful methods for studying health outcomes and many methods have been developed for analyzing case-control data. Those methods, however, are vulnerable to mismeasurement of variables; biased results are often produced if such a feature is ignored. In this paper, we develop an inference method for handling case-control data with interacting misclassified covariates. We use the prospective logistic regression model to feature the development of the disease. To characterize the misclassification process, we consider a practical situation where replicated measurements of error-prone covariates are available. Our work is motivated in part by a breast cancer case-control study where two binary covariates are subject to misclassification. Extensions to other settings are outlined.
- Published
- 2017
29. Correction to: Is 14-Days a Sensible Quarantine Length for COVID-19? Examinations of Some Associated Issues with a Case Study of COVID-19 Incubation Times
- Author
-
Yasin Khadem Charvadeh, Grace Y. Yi, Yuan Bian, and Wenqing He
- Subjects
Statistics and Probability ,Correction ,Biochemistry, Genetics and Molecular Biology (miscellaneous) - Published
- 2021
30. A Class of Weighted Estimating Equations for Semiparametric Transformation Models with Missing Covariates
- Author
-
Nancy Reid, Yang Ning, and Grace Y. Yi
- Subjects
Statistics and Probability ,05 social sciences ,Inference ,Estimator ,Estimating equations ,Missing data ,01 natural sciences ,010104 statistics & probability ,Robustness (computer science) ,0502 economics and business ,Statistics ,Covariate ,Transformation models ,0101 mathematics ,Statistics, Probability and Uncertainty ,050205 econometrics ,Mathematics ,Parametric statistics - Abstract
In survival analysis, covariate measurements often contain missing observations; ignoring this feature can lead to invalid inference. We propose a class of weighted estimating equations for right-censored data with missing covariates under semiparametric transformation models. Time-specific and subject-specific weights are accommodated in the formulation of the weighted estimating equations. We establish unified results for estimating missingness probabilities that cover both parametric and non-parametric modelling schemes. To improve estimation efficiency, the weighted estimating equations are augmented by a new set of unbiased estimating equations. The resultant estimator has the so-called ‘double robustness’ property and is optimal within a class of consistent estimators.
- Published
- 2017
31. Analysis of panel data under hidden mover-stayer models
- Author
-
Feng He, Wenqing He, and Grace Y. Yi
- Subjects
Statistics and Probability ,education.field_of_study ,Epidemiology ,Computer science ,Smoking prevention ,Population ,Inference ,Latent variable ,computer.software_genre ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Expectation–maximization algorithm ,Econometrics ,030212 general & internal medicine ,Data mining ,State (computer science) ,0101 mathematics ,education ,computer ,Panel data - Abstract
Analysis of panel data is often challenged by the presence of heterogeneity and state misclassification. In this paper, we propose a hidden mover-stayer model to facilitate heterogeneity for a population that consists of two subpopulations each of movers or of stayers and to simultaneously account for state misclassification. We develop an inference procedure based on the expectation-maximization algorithm by treating the mover-stayer indicator and underlying true states as latent variables. We evaluate the performance of the proposed method and investigate the impact of ignoring misclassification through simulation studies. The proposed method is applied to analyze the data arising from the Waterloo Smoking Prevention Project. Copyright © 2017 John Wiley & Sons, Ltd.
- Published
- 2017
32. A class of flexible models for analysis of complex structured correlated data with application to clustered longitudinal data
- Author
-
Wenqing He, Grace Y. Yi, and Haocheng Li
- Subjects
Statistics and Probability ,010104 statistics & probability ,Computer science ,Longitudinal data ,Data mining ,0101 mathematics ,Statistics, Probability and Uncertainty ,computer.software_genre ,01 natural sciences ,computer ,Class (biology) ,Generalized linear mixed model - Published
- 2017
33. Genetic association studies with bivariate mixed responses subject to measurement error and misclassification
- Author
-
Grace Y. Yi and Qihuang Zhang
- Subjects
Statistics and Probability ,Likelihood Functions ,Observational error ,Epidemiology ,Computer science ,Univariate ,Bivariate analysis ,Outcome (probability) ,Generalized linear mixed model ,Phenotype ,Bias ,Expectation–maximization algorithm ,Statistics ,Feature (machine learning) ,Computer Simulation ,Genetic Association Studies ,Genetic association - Abstract
In genetic association studies, mixed effects models have been widely used in detecting the pleiotropy effects which occur when one gene affects multiple phenotype traits. In particular, bivariate mixed effects models are useful for describing the association of a gene with a continuous trait and a binary trait. However, such models are inadequate to feature the data with response mismeasurement, a characteristic that is often overlooked. It has been well studied that in univariate settings, ignorance of mismeasurement in variables usually results in biased estimation. In this paper, we consider the setting with a bivariate outcome vector which contains a continuous component and a binary component both subject to mismeasurement. We propose an induced likelihood approach and an EM algorithm method to handle measurement error in continuous response and misclassification in binary response simultaneously. Simulation studies confirm that the proposed methods successfully remove the bias induced from the response mismeasurement.
- Published
- 2019
34. Multiclass analysis and prediction with network structured covariates
- Author
-
Wenqing He, Qihuang Zhang, Grace Y. Yi, and Li-Pang Chen
- Subjects
Multiclassification ,Statistics and Probability ,Logistic regression model ,Computer science ,Machine learning ,computer.software_genre ,01 natural sciences ,Normal distribution ,010104 statistics & probability ,03 medical and health sciences ,Exponential family ,Data acquisition ,Covariate ,0101 mathematics ,Special case ,030304 developmental biology ,Structure (mathematical logic) ,0303 health sciences ,business.industry ,Computer Science Applications ,F-score ,Network structure ,Classification methods ,Artificial intelligence ,lcsh:Probabilities. Mathematical statistics ,Statistics, Probability and Uncertainty ,lcsh:QA273-280 ,Focus (optics) ,business ,computer - Abstract
Technological advances associated with data acquisition are leading to the production of complex structured data sets. The recent development on classification with multiclass responses makes it possible to incorporate the dependence structure of predictors. The available methods, however, are hindered by the restrictive requirements. Those methods basically assume a common network structure for predictors of all subjects without taking into account the heterogeneity existing in different classes. Furthermore, those methods mainly focus on the case where the distribution of predictors is normal. In this paper, we propose classification methods which address these limitations. Our methods are flexible in handling possibly class-dependent network structures of variables and allow the predictors to follow a distribution in the exponential family which includes normal distributions as a special case. Our methods are computationally easy to implement. Numerical studies are conducted to demonstrate the satisfactory performance of the proposed methods.
- Published
- 2019
35. Dynamic tilted current correlation for high dimensional variable screening
- Author
-
Grace Y. Yi, Wenqing He, Xin Liu, and Bangxin Zhao
- Subjects
Statistics and Probability ,Clustering high-dimensional data ,Numerical Analysis ,020206 networking & telecommunications ,02 engineering and technology ,01 natural sciences ,010104 statistics & probability ,Variable (computer science) ,Consistency (statistics) ,Ordinary least squares ,0202 electrical engineering, electronic engineering, information engineering ,0101 mathematics ,Statistics, Probability and Uncertainty ,Projection (set theory) ,Spurious relationship ,Algorithm ,Independence (probability theory) ,Curse of dimensionality ,Mathematics - Abstract
Variable screening is a commonly used procedure in high dimensional data analysis to reduce dimensionality and ensure the applicability of available statistical methods. Such a procedure is complicated and computationally burdensome because spurious correlations commonly exist among predictor variables, while important predictor variables may not have large marginal correlations with the response variable. To circumvent these issues, in this paper, we develop a new screening technique, the “dynamic tilted current correlation screening” (DTCCS), for high dimensional variable screening. DTCCS is capable of selecting the most relevant predictors within a finite number of steps, and takes the popularly used sure independence screening (SIS) method and the high-dimensional ordinary least squares projection (HOLP) approach as its special cases. The DTCCS technique has sure screening and consistency properties which are justified theoretically and demonstrated numerically. A real example of gene expression data is analyzed using the proposed DTCCS procedure.
- Published
- 2021
36. mgee2: An R package for marginal analysis of longitudinal ordinal data with misclassified responses and covariates
- Author
-
Yuliang Xu, Shuo,Shuo Liu, and Grace,Y. Yi
- Subjects
Statistics and Probability ,Numerical Analysis ,Statistics, Probability and Uncertainty - Published
- 2021
37. Missing Data Mechanisms for Analysing Longitudinal Data with Incomplete Observations in Both Responses and Covariates
- Author
-
Grace Y. Yi and Haocheng Li
- Subjects
Statistics and Probability ,05 social sciences ,Estimator ,Inference ,Missing data ,01 natural sciences ,010104 statistics & probability ,Robustness (computer science) ,Joint probability distribution ,0502 economics and business ,Covariate ,Statistics ,Econometrics ,Pairwise comparison ,Imputation (statistics) ,0101 mathematics ,Statistics, Probability and Uncertainty ,050205 econometrics ,Mathematics - Abstract
Summary Missing observations in both responses and covariates arise frequently in longitudinal studies. When missing data are missing not at random, inferences under the likelihood framework often require joint modelling of response and covariate processes, as well as missing data processes associated with incompleteness of responses and covariates. Specification of these four joint distributions is a nontrivial issue from the perspectives of both modelling and computation. To get around this problem, we employ pairwise likelihood formulations, which avoid the specification of third or higher order association structures. In this paper, we consider three specific missing data mechanisms which lead to further simplified pairwise likelihood (SPL) formulations. Under these missing data mechanisms, inference methods based on SPL formulations are developed. The resultant estimators are consistent, and enjoy better robustness and computation convenience. The performance is evaluated empirically though simulation studies. Longitudinal data from the National Population Health Survey and Waterloo Smoking Prevention Project are analysed to illustrate the usage of our methods.
- Published
- 2016
38. Shrinkage and pretest estimators for longitudinal data analysis under partially linear models
- Author
-
Baojiang Chen, S. Ejaz Ahmed, Shakhawat Hossain, and Grace Y. Yi
- Subjects
Statistics and Probability ,Longitudinal data ,05 social sciences ,Linear model ,Mean and predicted response ,Estimator ,Parameter space ,01 natural sciences ,010104 statistics & probability ,Dimension (vector space) ,Likelihood-ratio test ,0502 economics and business ,Statistics ,Econometrics ,0101 mathematics ,Statistics, Probability and Uncertainty ,050205 econometrics ,Shrinkage ,Mathematics - Abstract
In this paper, we develop marginal analysis methods for longitudinal data under partially linear models. We employ the pretest and shrinkage estimation procedures to estimate the mean response parameters as well as the association parameters, which may be subject to certain restrictions. We provide the analytic expressions for the asymptotic biases and risks of the proposed estimators, and investigate their relative performance to the unrestricted semiparametric least-squares estimator (USLSE). We show that if the dimension of association parameters exceeds two, the risk of the shrinkage estimators is strictly less than that of the USLSE in most of the parameter space. On the other hand, the risk of the pretest estimator depends on the validity of the restrictions of association parameters. A simulation study is conducted to evaluate the performance of the proposed estimators relative to that of the USLSE. A real data example is applied to illustrate the practical usefulness of the proposed estimation pro...
- Published
- 2016
39. Weighted causal inference methods with mismeasured covariates and misclassified outcomes
- Author
-
Grace Y. Yi and Di Shu
- Subjects
Statistics and Probability ,Adult ,Male ,Epidemiology ,Computer science ,Average treatment effect ,Logistic regression ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Statistics ,Covariate ,Humans ,Computer Simulation ,030212 general & internal medicine ,0101 mathematics ,Exercise ,Aged ,Probability ,Estimation ,Observational error ,Smokers ,Inverse probability weighting ,Middle Aged ,Health Surveys ,Causality ,Causal inference ,Female ,Smoking Cessation ,Estimation methods ,Algorithms - Abstract
Inverse probability weighting (IPW) estimation has been widely used in causal inference. Its validity relies on the important condition that the variables are precisely measured. This condition, however, is often violated, which distorts the IPW method and thus yields biased results. In this paper, we study the IPW estimation of average treatment effects for settings with mismeasured covariates and misclassified outcomes. We develop estimation methods to correct for measurement error and misclassification effects simultaneously. Our discussion covers a broad scope of treatment models, including typically assumed logistic regression models and general treatment assignment mechanisms. Satisfactory performance of the proposed methods is demonstrated by extensive numerical studies.
- Published
- 2018
40. A corrected profile likelihood method for survival data with covariate measurement error under the Cox model
- Author
-
Ying Yan and Grace Y. Yi
- Subjects
Statistics and Probability ,Observational error ,Additive error ,Proportional hazards model ,05 social sciences ,01 natural sciences ,010104 statistics & probability ,Survival data ,Berkson error model ,0502 economics and business ,Covariate ,Statistics ,0101 mathematics ,Statistics, Probability and Uncertainty ,050205 econometrics ,Mathematics - Abstract
In survival analysis, covariate measurement error has been studied extensively for the Cox model. In this article, we propose a corrected profile likelihood approach, and show that many existing methods can be unified by our approach. Furthermore, we extend our discussion to general measurement error and Berkson models, as opposed to the classical additive error model that has been widely used in the literature. We investigate the impact of model misspecification of the measurement error process and uncover interesting findings. Empirical studies are carried out to illustrate the usage of the proposed methods and to assess their performance. The Canadian Journal of Statistics 43: 454–480; 2015 © 2015 Statistical Society of Canada Resume En analyse de survie, les consequences des erreurs de mesure sur les covariables ont ete largement etudiees pour le modele de Cox. Les auteures proposent une solution basee sur une correction du profil de vraisemblance et montrent que leur approche permet d'unifier plusieurs methodes existantes pour traiter les erreurs de mesure. Elles etendent leur discussion au modele general d'erreurs de mesure et au modele de Berkson, plutot qu'au modele d'erreur additive omnipresent dans la litterature. Elles etudient l'impact d'une mauvaise specification du processus produisant les erreurs de mesure et font des decouvertes interessantes. Elles procedent finalement a des etudes empiriques afin d'illustrer l'usage de la methode proposee et d'en evaluer la performance. La revue canadienne de statistique xx: 1–27; 2015 © 2015 Societe statistique du Canada
- Published
- 2015
41. Parametric Regression Analysis with Covariate Misclassification in Main Study/Validation Study Designs
- Author
-
Grace Y. Yi, Donna Spiegelman, Ying Yan, and Xiaomei Liao
- Subjects
Statistics and Probability ,Computer science ,media_common.quotation_subject ,Inference ,Survey sampling ,Validation Studies as Topic ,Machine learning ,computer.software_genre ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Covariate ,Feature (machine learning) ,Humans ,Quality (business) ,030212 general & internal medicine ,0101 mathematics ,media_common ,Parametric statistics ,Observational error ,business.industry ,Regression analysis ,General Medicine ,Regression Analysis ,Female ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,computer - Abstract
Measurement error and misclassification have long been a concern in many fields, including medicine, administrative health care data, epidemiology, and survey sampling. It is known that measurement error and misclassification may seriously degrade the quality of estimation and inference, and should be avoided whenever possible. However, in practice, it is inevitable that measurements contain error for a variety of reasons. It is thus necessary to develop statistical strategies to cope with this issue. Although many inference methods have been proposed in the literature to address mis-measurement effects, some important issues remain unexplored. Typically, it is generally unclear how the available methods may perform relative to each other. In this paper, capitalizing on the unique feature of discrete variables, we consider settings with misclassified binary covariates and investigate issues concerning covariate misclassification; our development parallels available strategies for handling measurement error in continuous covariates. Under a unified framework, we examine a number of valid inferential procedures for practical settings where a validation study, either internal or external, is available besides a main study. Furthermore, we compare the relative performance of these methods and make practical recommendations.
- Published
- 2017
42. A weighted composite likelihood approach for analysis of survey data under two-level models
- Author
-
Haocheng Li, Grace Y. Yi, and J. N. K. Rao
- Subjects
Statistics and Probability ,010104 statistics & probability ,03 medical and health sciences ,Quasi-maximum likelihood ,030503 health policy & services ,Statistics ,Survey data collection ,0101 mathematics ,Statistics, Probability and Uncertainty ,0305 other medical science ,01 natural sciences ,Mathematics - Published
- 2017
43. Joint modeling of survival data and mismeasured longitudinal data using the proportional odds model
- Author
-
Grace Y. Yi, Juan Xiong, and Wenqing He
- Subjects
Statistics and Probability ,Observational error ,Survival data ,Longitudinal data ,Applied Mathematics ,Statistics ,Econometrics ,Ordered logit ,Joint (geology) ,Survival analysis ,Mathematics - Published
- 2014
44. Inverse-probability-of-treatment weighted estimation of causal parameters in the presence of error-contaminated and time-dependent confounders
- Author
-
Grace Y. Yi and Di Shu
- Subjects
Statistics and Probability ,Estimation ,Observational error ,Biometry ,Time Factors ,Confounding ,Inference ,Marginal structural model ,General Medicine ,Logistic regression ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Inverse probability ,Research Design ,Causal inference ,Statistics ,Multivariate Analysis ,Regression Analysis ,030212 general & internal medicine ,0101 mathematics ,Statistics, Probability and Uncertainty ,Mathematics ,Probability - Abstract
Inverse-probability-of-treatment weighted (IPTW) estimation has been widely used to consistently estimate the causal parameters in marginal structural models, with time-dependent confounding effects adjusted for. Just like other causal inference methods, the validity of IPTW estimation typically requires the crucial condition that all variables are precisely measured. However, this condition, is often violated in practice due to various reasons. It has been well documented that ignoring measurement error often leads to biased inference results. In this paper, we consider the IPTW estimation of the causal parameters in marginal structural models in the presence of error-contaminated and time-dependent confounders. We explore several methods to correct for the effects of measurement error on the estimation of causal parameters. Numerical studies are reported to assess the finite sample performance of the proposed methods.
- Published
- 2016
45. ipwErrorY: An R Package for Estimation of Average Treatment Effect with Misclassified Binary Outcome
- Author
-
Di Shu and Grace Y. Yi
- Subjects
Statistics and Probability ,Estimation ,Numerical Analysis ,R package ,Average treatment effect ,Binary outcome ,Statistics ,Statistics, Probability and Uncertainty ,Mathematics - Published
- 2019
46. swgee: An R Package for Analyzing Longitudinal Data with Response Missingness and Covariate Measurement Error
- Author
-
Juan Xiong and Grace Y. Yi
- Subjects
Statistics and Probability ,Numerical Analysis ,R package ,Observational error ,Longitudinal data ,Statistics ,Covariate ,Statistics, Probability and Uncertainty ,Missing data ,Mathematics - Published
- 2019
47. Marginal analysis of longitudinal ordinal data with misclassification in both response and covariates
- Author
-
Changbao Wu, Zhijian Chen, and Grace Y. Yi
- Subjects
Statistics and Probability ,Estimation ,Ordinal data ,Association (object-oriented programming) ,Inference ,General Medicine ,Joint probability distribution ,Statistics ,Covariate ,Econometrics ,Statistics, Probability and Uncertainty ,Categorical variable ,Parametric statistics ,Mathematics - Abstract
Marginal methods have been widely used for the analysis of longitudinal ordinal and categorical data. These models do not require full parametric assumptions on the joint distribution of repeated response measurements but only specify the marginal or even association structures. However, inference results obtained from these methods often incur serious bias when variables are subject to error. In this paper, we tackle the problem that misclassification exists in both response and categorical covariate variables. We develop a marginal method for misclassification adjustment, which utilizes second-order estimating functions and a functional modeling approach, and can yield consistent estimates and valid inference for mean and association parameters. We propose a two-stage estimation approach for cases in which validation data are available. Our simulation studies show good performance of the proposed method under a variety of settings. Although the proposed method is phrased to data with a longitudinal design, it also applies to correlated data arising from clustered and family studies, in which association parameters may be of scientific interest. The proposed method is applied to analyze a dataset from the Framingham Heart Study as an illustration.
- Published
- 2013
48. Simultaneous model selection and estimation for mean and association structures with clustered binary data
- Author
-
Grace Y. Yi and Xin Gao
- Subjects
Statistics and Probability ,Model selection ,Covariate ,Binary data ,Applied mathematics ,Estimator ,Feature selection ,Estimating equations ,Statistics, Probability and Uncertainty ,Generalized estimating equation ,Selection (genetic algorithm) ,Mathematics - Abstract
This paper investigates the property of the penalized estimating equations when both the mean and association structures are modelled. To select variables for the mean and association structures sequentially, we propose a hierarchical penalized generalized estimating equations (HPGEE2) approach. The first set of penalized estimating equations is solved for the selection of significant mean parameters. Conditional on the selected mean model, the second set of penalized estimating equations is solved for the selection of significant association parameters. The hierarchical approach is designed to accommodate possible model constraints relating the inclusion of covariates into the mean and the association models. This two-step penalization strategy enjoys a compelling advantage of easing computational burdens compared to solving the two sets of penalized equations simultaneously. HPGEE2 with a smoothly clipped absolute deviation (SCAD) penalty is shown to have the oracle property for the mean and association models. The asymptotic behavior of the penalized estimator under this hierarchical approach is established. An efficient two-stage penalized weighted least square algorithm is developed to implement the proposed method. The empirical performance of the proposed HPGEE2 is demonstrated through Monte-Carlo studies and the analysis of a clinical data set. Copyright © 2013 John Wiley & Sons Ltd
- Published
- 2013
49. Marginal methods for clustered longitudinal binary data with incomplete covariates
- Author
-
Grace Y. Yi, Richard J. Cook, Xiao-Hua Zhou, and Baojiang Chen
- Subjects
Statistics and Probability ,Applied Mathematics ,Estimator ,Sample (statistics) ,Estimating equations ,Missing data ,Article ,Data set ,Binary data ,Covariate ,Statistics ,Econometrics ,Statistics, Probability and Uncertainty ,Generalized estimating equation ,Mathematics - Abstract
Many analyses for incomplete longitudinal data are directed to examining the impact of covariates on the marginal mean responses. We consider the setting in which longitudinal responses are collected from individuals nested within clusters. We discuss methods for assessing covariate effects on the mean and association parameters when covariates are incompletely observed. Weighted first and second order estimating equations are constructed to obtain consistent estimates of mean and association parameters when covariates are missing at random. Empirical studies demonstrate that estimators from the proposed method have negligible finite sample biases in moderate samples. An application to the National Alzheimer's Coordinating Center (NACC) Uniform Data Set (UDS) demonstrates the utility of the proposed method.
- Published
- 2012
50. Likelihood-based and marginal inference methods for recurrent event data with covariate measurement error
- Author
-
Jerald F. Lawless and Grace Y. Yi
- Subjects
Statistics and Probability ,Recurrent event ,Covariate ,Statistics ,Inference ,Statistics, Probability and Uncertainty ,Humanities ,Mathematics - Abstract
Recurrent event data arise commonly in medical and public health studies. The analysis of such data has received extensive research attention and various methods have been developed in the literature. Depending on the focus of scientific interest, the methods may be broadly classified as intensity-based counting process methods, mean function-based estimating equation methods, and the analysis of times to events or times between events. These methods and models cover a wide variety of practical applications. However, there is a critical assumption underlying those methods–variables need to be correctly measured. Unfortunately, this assumption is frequently violated in practice. It is quite common that some covariates are subject to measurement error. It is well known that covariate measurement error can substantially distort inference results if it is not properly taken into account. In the literature, there has been extensive research concerning measurement error problems in various settings. However, with recurrent events, there is little discussion on this topic. It is the objective of this paper to address this important issue. In this paper, we develop inferential methods which account for measurement error in covariates for models with multiplicative intensity functions or rate functions. Both likelihood-based inference and robust inference based on estimating equations are discussed. The Canadian Journal of Statistics 40: 530–549; 2012 © 2012 Statistical Society of Canada Les donnees d'evenements recurrents se retrouvent frequemment dans les etudes en medecine et en sante publique. L'analyse de telles donnees a ete l'objet de recherches exhaustives et plusieurs methodes ont ete proposees dans la litterature. Selon leur interet scientifique, elles peuvent etre grossierement classees comme etant des methodes basees sur l'intensite des processus de comptage, des methodes basees sur les equations d'estimation de la moyenne et les analyses des durees de vie ou le temps d'attente entre deux evenements. Ces methodes et modeles couvrent une grande variete d'applications. Cependant, un presuppose critique sous-jacent a ces methodes est que les variables doivent etre mesurees exactement. Malheureusement, en pratique, ce presuppose est souvent viole. Il arrive assez souvent que quelques covariables soient mesurees avec erreur. Il est bien connu que les erreurs de mesure sur les covariables peuvent grandement modifier les resultats de l'inference s'ils ne sont pas bien pris en charge. Il existe une litterature exhaustive sur le probleme des erreurs de mesure dans differents contextes. Cependant, il y a peu de recherche faite dans le contexte des evenements recurrents. Ainsi, l'objectif de cet article est de developper des methodes d'inference prenant en compte les erreurs de mesure des covariables dans les modeles avec des fonctions d'intensite ou de taux multiplicatives. Nous considerons l'inference basee sur la fonction de vraisemblance et celle robuste basee sur les equations d'estimation. La revue canadienne de statistique 40: 530–549; 2012 © 2012 Societe statistique du Canada
- Published
- 2012
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.