38 results on '"Mallick, Bani"'
Search Results
2. Bayesian Semiparametric Inference for the Accelerated Failure-Time Model
- Author
-
Kuo, Lynn and Mallick, Bani
- Published
- 1997
3. Semiparametric Bayesian Analysis of Matched Case-Control Studies with Missing Exposure
- Author
-
Sinha, Samiran, Mukherjee, Bhramar, Ghosh, Malay, Mallick, Bani K., and Carroll, Raymond J.
- Published
- 2005
- Full Text
- View/download PDF
4. Bayesian Estimation of Correlation Matrices of Longitudinal Data.
- Author
-
Ghosh, Riddhi Pratim, Mallick, Bani, and Pourahmadi, Mohsen
- Subjects
BAYESIAN analysis ,AUTOCORRELATION (Statistics) ,PARAMETERIZATION ,COVARIANCE matrices ,MATHEMATICAL equivalence - Abstract
Estimation of correlation matrices is a challenging problem due to the notorious positive-definiteness constraint and high-dimensionality. Reparameterizing Cholesky factors of correlation matrices in terms of angles or hyperspherical coordinates where the angles vary freely in the range [0, p) has become popular in the last two decades. However, it has not been used in Bayesian estimation of correlation matrices perhaps due to lack of clear statistical relevance and suitable priors for the angles. In this paper, we show for the first time that for longitudinal data these angles are the inverse cosine of the semi-partial correlations (SPCs). This simple connection makes it possible to introduce physically meaningful selection and shrinkage priors on the angles or correlation matrices with emphasis on selection (sparsity) and shrinking towards longitudinal structure. Our method deals effectively with the positive-definiteness constraint in posterior computation. We compare the performance of our Bayesian estimation based on angles with some recent methods based on partial autocorrelations through simulation and apply the method to a data related to clinical trial on smoking. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
5. Efficient Bayesian Regularization for Graphical Model Selection.
- Author
-
Kundu, Suprateek, Mallick, Bani K., and Baladandayuthapani, Veera
- Subjects
BAYESIAN analysis ,GRAPHICAL modeling (Statistics) ,MATHEMATICAL regularization ,ANALYSIS of covariance ,SIMULATION methods & models - Abstract
There has been an intense development in the Bayesian graphical model literature over the past decade; however, most of the existing methods are restricted to moderate dimensions. We propose a novel graphical model selection approach for large dimensional settings where the dimension increases with the sample size, by decoupling model fitting and covariance selection. First, a full model based on a complete graph is fit under a novel class of mixtures of inverse- Wishart priors, which induce shrinkage on the precision matrix under an equivalence with Cholesky-based regularization, while enabling conjugate updates. Subsequently, a post-fitting model selection step uses penalized joint credible regions to perform model selection. This allows our methods to be computationally feasible for large dimensional settings using a combination of straightforward Gibbs samplers and efficient post-fitting inferences. Theoretical guarantees in terms of selection consistency are also established. Simulations show that the proposed approach compares favorably with competing methods, both in terms of accuracy metrics and computation times. We apply this approach to a cancer genomics data example. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
6. Hierarchical Bayesian models for predicting spatially correlated curves.
- Author
-
Song, Joon Jin and Mallick, Bani
- Subjects
- *
STATISTICAL research , *WAVELETS (Mathematics) , *COEFFICIENTS (Statistics) , *WAVELET transforms , *BAYESIAN analysis - Abstract
Functional data analysis has emerged as a new area of statistical research with a wide range of applications. In this paper, we propose novel models based on wavelets for spatially correlated functional data. These models enable one to regularize curves observed over space and predict curves at unobserved sites. We compare the performance of these Bayesian models with several priors on the wavelet coefficients using the posterior predictive criterion. The proposed models are illustrated in the analysis of porosity data. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
7. A Powerful Bayesian Test for Equality of Means in High Dimensions.
- Author
-
Zoh, Roger S., Sarkar, Abhra, Mallick, Bani K., and Carroll, Raymond J.
- Subjects
BAYESIAN analysis ,ANALYSIS of means ,DIMENSIONS ,COVARIANCE matrices ,DATA analysis ,STATISTICAL hypothesis testing ,RANDOM projection method - Abstract
We develop a Bayes factor-based testing procedure for comparing two population means in high-dimensional settings. In 'large-p-small-n" settings, Bayes factors based on proper priors require eliciting a large and complex p × p covariance matrix, whereas Bayes factors based on Jeffrey's prior suffer the same impediment as the classical Hotelling T
2 test statistic as they involve inversion of ill-formed sample covariance matrices. To circumvent this limitation, we propose that the Bayes factor be based on lower dimensional random projections of the high-dimensional data vectors. We choose the prior under the alternative to maximize the power of the test for a fixed threshold level, yielding a restricted most powerful Bayesian test (RMPBT). The final test statistic is based on the ensemble of Bayes factors corresponding to multiple replications of randomly projected data. We show that the test is unbiased and, under mild conditions, is also locally consistent. We demonstrate the efficacy of the approach through simulated and real data examples. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
8. Bayesian variable selection with graphical structure learning: Applications in integrative genomics.
- Author
-
Kundu, Suprateek, Cheng, Yichen, Shin, Minsuk, Manyam, Ganiraju, Mallick, Bani K., and Baladandayuthapani, Veerabhadran
- Subjects
EPIGENOMICS ,CANCER invasiveness ,CANCER treatment ,MULTICOLLINEARITY ,BAYESIAN analysis - Abstract
Significant advances in biotechnology have allowed for simultaneous measurement of molecular data across multiple genomic, epigenomic and transcriptomic levels from a single tumor/patient sample. This has motivated systematic data-driven approaches to integrate multi-dimensional structured datasets, since cancer development and progression is driven by numerous co-ordinated molecular alterations and the interactions between them. We propose a novel multi-scale Bayesian approach that combines integrative graphical structure learning from multiple sources of data with a variable selection framework—to determine the key genomic drivers of cancer progression. The integrative structure learning is first accomplished through novel joint graphical models for heterogeneous (mixed scale) data, allowing for flexible and interpretable incorporation of prior existing knowledge. This subsequently informs a variable selection step to identify groups of co-ordinated molecular features within and across platforms associated with clinical outcomes of cancer progression, while according appropriate adjustments for multicollinearity and multiplicities. We evaluate our methods through rigorous simulations to establish superiority over existing methods that do not take the network and/or prior information into account. Our methods are motivated by and applied to a glioblastoma multiforme (GBM) dataset from The Cancer Genome Atlas to predict patient survival times integrating gene expression, copy number and methylation data. We find a high concordance between our selected prognostic gene network modules with known associations with GBM. In addition, our model discovers several novel cross-platform network interactions (both cis and trans acting) between gene expression, copy number variation associated gene dosing and epigenetic regulation through promoter methylation, some with known implications in the etiology of GBM. Our framework provides a useful tool for biomedical researchers, since clinical prediction using multi-platform genomic information is an important step towards personalized treatment of many cancers. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
9. Two-Stage Metropolis-Hastings for Tall Data.
- Author
-
Payne, Richard D. and Mallick, Bani K.
- Subjects
- *
DATA analysis , *MARKOV chain Monte Carlo , *LOGISTIC model (Demography) , *BAYESIAN analysis , *LOGISTIC regression analysis - Abstract
This paper discusses the challenges presented by tall data problems associated with Bayesian classification (specifically binary classification) and the existing methods to handle them. Current methods include parallelizing the likelihood, subsampling, and consensus Monte Carlo. A new method based on the two-stage Metropolis-Hastings algorithm is also proposed. The purpose of this algorithm is to reduce the exact likelihood computational cost in the tall data situation. In the first stage, a new proposal is tested by the approximate likelihood based model. The full likelihood based posterior computation will be conducted only if the proposal passes the first stage screening. Furthermore, this method can be adopted into the consensus Monte Carlo framework. The two-stage method is applied to logistic regression, hierarchical logistic regression, and Bayesian multivariate adaptive regression splines. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
10. Bayesian Semiparametric Multivariate Density Deconvolution.
- Author
-
Sarkar, Abhra, Pati, Debdeep, Chakraborty, Antik, Mallick, Bani K., and Carroll, Raymond J.
- Subjects
BAYESIAN analysis ,MULTIVARIATE analysis ,DECONVOLUTION (Mathematics) ,MEASUREMENT errors ,FINITE mixture models (Statistics) - Abstract
We consider the problem of multivariate density deconvolution when interest lies in estimating the distribution of a vector valued random variable X but precise measurements on X are not available, observations being contaminated by measurement errors U. The existing sparse literature on the problem assumes the density of the measurement errors to be completely known. We propose robust Bayesian semiparametric multivariate deconvolution approaches when the measurement error density of U is not known but replicated proxies are available for at least some individuals. Additionally, we allow the variability of U to depend on the associated unobserved values of X through unknown relationships, which also automatically includes the case of multivariate multiplicative measurement errors. Basic properties of finite mixture models, multivariate normal kernels, and exchangeable priors are exploited in novel ways to meet modeling and computational challenges. Theoretical results showing the flexibility of the proposed methods in capturing a wide variety of data-generating processes are provided. We illustrate the efficiency of the proposed methods in recovering the density of X through simulation experiments. The methodology is applied to estimate the joint consumption pattern of different dietary components from contaminated 24 h recalls. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
11. Bayesian semiparametric regression in the presence of conditionally heteroscedastic measurement and regression errors.
- Author
-
Sarkar, Abhra, Mallick, Bani K., and Carroll, Raymond J.
- Subjects
- *
BAYESIAN analysis , *HETEROSCEDASTICITY , *ERROR analysis in mathematics , *COMPUTER simulation , *EPIDEMIOLOGY - Abstract
We consider the problem of robust estimation of the regression relationship between a response and a covariate based on sample in which precise measurements on the covariate are not available but error-prone surrogates for the unobserved covariate are available for each sampled unit. Existing methods often make restrictive and unrealistic assumptions about the density of the covariate and the densities of the regression and the measurement errors, for example, normality and, for the latter two, also homoscedasticity and thus independence from the covariate. In this article we describe Bayesian semiparametric methodology based on mixtures of B-splines and mixtures induced by Dirichlet processes that relaxes these restrictive assumptions. In particular, our models for the aforementioned densities adapt to asymmetry, heavy tails and multimodality. The models for the densities of regression and measurement errors also accommodate conditional heteroscedasticity. In simulation experiments, our method vastly outperforms existing methods. We apply our method to data from nutritional epidemiology. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
12. Bayesian Semiparametric Density Deconvolution in the Presence of Conditionally Heteroscedastic Measurement Errors.
- Author
-
Sarkar, Abhra, Mallick, Bani K., Staudenmayer, John, Pati, Debdeep, and Carroll, Raymond J.
- Subjects
- *
RANDOM variables , *BAYESIAN analysis , *DECONVOLUTION (Mathematics) , *HETEROSCEDASTICITY , *ERROR analysis in mathematics , *ELECTRONIC information resources , *DIRICHLET problem - Abstract
We consider the problem of estimating the density of a random variable when precise measurements on the variable are not available, but replicated proxies contaminated with measurement error are available for sufficiently many subjects. Under the assumption of additive measurement errors this reduces to a problem of deconvolution of densities. Deconvolution methods often make restrictive and unrealistic assumptions about the density of interest and the distribution of measurement errors, for example, normality and homoscedasticity and thus independence from the variable of interest. This article relaxes these assumptions and introduces novel Bayesian semiparametric methodology based on Dirichlet process mixture models for robust deconvolution of densities in the presence of conditionally heteroscedastic measurement errors. In particular, the models can adapt to asymmetry, heavy tails, and multimodality. In simulation experiments, we show that our methods vastly outperform a recent Bayesian approach based on estimating the densities via mixtures of splines. We apply our methods to data from nutritional epidemiology. Even in the special case when the measurement errors are homoscedastic, our methodology is novel and dominates other methods that have been proposed previously. Additional simulation results, instructions on getting access to the dataset and R programs implementing our methods are included as part of online supplementary materials. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
13. Bayesian Uncertainty Quantification for Subsurface Inversion Using a Multiscale Hierarchical Model.
- Author
-
Mondal, Anirban, Mallick, Bani, Efendiev, Yalchin, and Datta-Gupta, Akhil
- Subjects
- *
BAYESIAN analysis , *UNCERTAINTY , *INVERSION (Geophysics) , *PREDICATE calculus , *RANDOM fields - Abstract
We consider a Bayesian approach to nonlinear inverse problems in which the unknown quantity is a random field (spatial or temporal). The Bayesian approach contains a natural mechanism for regularization in the form of prior information, can incorporate information from heterogeneous sources and provide a quantitative assessment of uncertainty in the inverse solution. The Bayesian setting casts the inverse solution as a posterior probability distribution over the model parameters. The Karhunen-Loeve expansion is used for dimension reduction of the random field. Furthermore, we use a hierarchical Bayes’ model to inject multiscale data in the modeling framework. In this Bayesian framework, we show that this inverse problem is well-posed by proving that the posterior measure is Lipschitz continuous with respect to the data in total variation norm. Computational challenges in this construction arise from the need for repeated evaluations of the forward model (e.g., in the context of MCMC) and are compounded by high dimensionality of the posterior. We develop two-stage reversible jump MCMC that has the ability to screen the bad proposals in the first inexpensive stage. Numerical results are presented by analyzing simulated as well as real data from hydrocarbon reservoir. This article has supplementary material available online. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
14. Bayesian hierarchical structured variable selection methods with application to molecular inversion probe studies in breast cancer.
- Author
-
Zhang, Lin, Baladandayuthapani, Veerabhadran, Mallick, Bani K., Manyam, Ganiraju C., Thompson, Patricia A., Bondy, Melissa L., and Do, Kim‐Anh
- Subjects
BREAST cancer ,BAYESIAN analysis ,GENOMICS ,GENETIC markers ,CHROMOSOMES ,GENES - Abstract
The analysis of genomics alterations that may occur in nature when segments of chromosomes are copied (known as copy number alterations) has been a focus of research to identify genetic markers of cancer. One high throughput technique that has recently been adopted is the use of molecular inversion probes to measure probe copy number changes. The resulting data consist of high dimensional copy number profiles that can be used to ascertain probe-specific copy number alterations in correlative studies with patient outcomes to guide risk stratification and future treatment. We propose a novel Bayesian variable selection method, the hierarchical structured variable selection method, which accounts for the natural gene and probe-within-gene architecture to identify important genes and probes associated with clinically relevant outcomes. We propose the hierarchical structured variable selection model for grouped variable selection, where simultaneous selection of both groups and within-group variables is of interest. The hierarchical structured variable selection model utilizes a discrete mixture prior distribution for group selection and group-specific Bayesian lasso hierarchies for variable selection within groups. We provide methods for accounting for serial correlations within groups that incorporate Bayesian fused lasso methods for within-group selection. Through simulations we establish that our method results in lower model errors than other methods when a natural grouping structure exists. We apply our method to a molecular inversion probe study of breast cancer and show that it identifies genes and probes that are significantly associated with clinically relevant subtypes of breast cancer. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
15. Adaptive Bayesian Nonstationary Modeling for Large Spatial Datasets Using Covariance Approximations.
- Author
-
Konomi, Bledar A., Sang, Huiyan, and Mallick, Bani K.
- Subjects
ADAPTIVE computing systems ,BAYESIAN analysis ,COMPUTER simulation ,DATA analysis ,ANALYSIS of covariance ,APPROXIMATION theory ,GAUSSIAN processes - Abstract
Gaussian process models have been widely used in spatial statistics but face tremendous modeling and computational challenges for very large nonstationary spatial datasets. To address these challenges, we develop a Bayesian modeling approach using a nonstationary covariance function constructed based on adaptively selected partitions. The partitioned nonstationary class allows one to knit together local covariance parameters into a valid global nonstationary covariance for prediction, where the local covariance parameters are allowed to be estimated within each partition to reduce computational cost. To further facilitate the computations in local covariance estimation and global prediction, we use the full-scale covariance approximation (FSA) approach for the Bayesian inference of our model. One of our contributions is to model the partitions stochastically by embedding a modified treed partitioning process into the hierarchical models that leads to automated partitioning and substantial computational benefits. We illustrate the utility of our method with simulation studies and the global Total Ozone Matrix Spectrometer (TOMS) data. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
16. Bayesian sparse graphical models and their mixtures.
- Author
-
Talluri, Rajesh, Baladandayuthapani, Veerabhadran, and Mallick, Bani K.
- Subjects
BAYESIAN analysis ,GAUSSIAN processes ,PARSIMONIOUS models ,MATRICES (Mathematics) ,GRAPHICAL modeling (Statistics) ,FINITE mixture models (Statistics) - Abstract
We propose Bayesian methods for Gaussian graphical models that lead to sparse and adaptively shrunk estimators of the precision (inverse covariance) matrix. Our methods are based on lasso-type regularization priors leading to parsimonious parameterization of the precision matrix, which is essential in several applications involving learning relationships among the variables. In this context, we introduce a novel type of selection prior that develops a sparse structure on the precision matrix by making most of the elements exactly zero, in addition to ensuring positive definiteness--thus conducting model selection and estimation simultaneously. More importantly, we extend these methods to analyse clustered data using finite mixtures of Gaussian graphical model and infinite mixtures of Gaussian graphical models. We discuss appropriate posterior simulation schemes to implement posterior inference in the proposed models, including the evaluation of normalizing constants that are functions of parameters of interest, which result from the restriction of positive definiteness on the correlation matrix. We evaluate the operating characteristics of our method via several simulations and demonstrate the application to real-data examples in genomics. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
17. Joint High-Dimensional Bayesian Variable and Covariance Selection with an Application to eQTL Analysis.
- Author
-
Bhadra, Anindya and Mallick, Bani K.
- Subjects
- *
BAYESIAN analysis , *HIGH-dimensional model representation , *COVARIANCE matrices , *SINGLE nucleotide polymorphisms , *LOCUS (Genetics) - Abstract
Summary We describe a Bayesian technique to (a) perform a sparse joint selection of significant predictor variables and significant inverse covariance matrix elements of the response variables in a high-dimensional linear Gaussian sparse seemingly unrelated regression (SSUR) setting and (b) perform an association analysis between the high-dimensional sets of predictors and responses in such a setting. To search the high-dimensional model space, where both the number of predictors and the number of possibly correlated responses can be larger than the sample size, we demonstrate that a marginalization-based collapsed Gibbs sampler, in combination with spike and slab type of priors, offers a computationally feasible and efficient solution. As an example, we apply our method to an expression quantitative trait loci (eQTL) analysis on publicly available single nucleotide polymorphism (SNP) and gene expression data for humans where the primary interest lies in finding the significant associations between the sets of SNPs and possibly correlated genetic transcripts. Our method also allows for inference on the sparse interaction network of the transcripts (response variables) after accounting for the effect of the SNPs (predictor variables). We exploit properties of Gaussian graphical models to make statements concerning conditional independence of the responses. Our method compares favorably to existing Bayesian approaches developed for this purpose. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
18. Bayesian Nonparametric Regression Analysis of Data with Random Effects Covariates from Longitudinal Measurements.
- Author
-
Duchwan Ryu, Li, Erning, and Mallick, Bani K.
- Subjects
LINEAR statistical models ,ANALYSIS of covariance ,RANDOM data (Statistics) ,BAYESIAN analysis ,SPLINE theory ,MONTE Carlo method ,MARKOV processes ,SIMULATION methods & models - Abstract
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the 'naive' approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
19. Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection.
- Author
-
Dhavala, Soma S., Datta, Sujay, Mallick, Bani K., Carroll, Raymond J., Khare, Sangeeta, Lawhon, Sara D., and Ad, L. Garry
- Subjects
BAYESIAN analysis ,FOODBORNE diseases ,SALMONELLA detection ,DIRICHLET problem ,POISSON'S equation ,DIAGNOSIS - Abstract
Massively Parallel Signature Sequencing (MPSS) is a high-throughput, counting-based technology available for gene expression profiling. It produces output that is similar to Serial Analysis of Gene Expression and is ideal for building complex relational databases for gene expression. Our goal is to compare the in vivo global gene expression profiles of tissues infected with different strains of Salmonella obtained using the MPSS technology. In this article, we develop an exact ANOVA type model for this count data using a zero-inflated Poisson distribution, different from existing methods that assume continuous densities. We adopt two Bayesian hierarchical models-one parametric and the other semiparametric with a Dirichlet process prior that has the ability to 'borrow strength' across related signatures, where a signature is a specific arrangement of the nucleotides, usually 16-21 base pairs long. We utilize the discreteness of Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using nonparametric approaches, while controlling the false discovery rate. We identify several differentially expressed genes that have important biological significance and conclude with a summary of the biological discoveries. This article has supplementary materials online. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
20. Semiparametric Bayesian Analysis of Nutritional Epidemiology Data in the Presence of Measurement Error.
- Author
-
Sinha, Samiran, Mallick, Bani K., Kipnis, Victor, and Carroll, Raymond J.
- Subjects
- *
BAYESIAN analysis , *MEASUREMENT errors , *DIRICHLET problem , *NONPARAMETRIC statistics , *NUTRITION , *EPIDEMIOLOGY , *MATHEMATICAL variables , *LINEAR statistical models - Abstract
We propose a semiparametric Bayesian method for handling measurement error in nutritional epidemiological data. Our goal is to estimate nonparametrically the form of association between a disease and exposure variable while the true values of the exposure are never observed. Motivated by nutritional epidemiological data, we consider the setting where a surrogate covariate is recorded in the primary data, and a calibration data set contains information on the surrogate variable and repeated measurements of an unbiased instrumental variable of the true exposure. We develop a flexible Bayesian method where not only is the relationship between the disease and exposure variable treated semiparametrically, but also the relationship between the surrogate and the true exposure is modeled semiparametrically. The two nonparametric functions are modeled simultaneously via B-splines. In addition, we model the distribution of the exposure variable as a Dirichlet process mixture of normal distributions, thus making its modeling essentially nonparametric and placing this work into the context of functional measurement error modeling. We apply our method to the NIH-AARP Diet and Health Study and examine its performance in a simulation study. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
21. Why do we observe misclassification errors smaller than the Bayes error?
- Author
-
Fu, Wenjiang J., Mallick, Bani, and Carroll, Raymond J.
- Subjects
- *
DISCRIMINANT analysis , *MONTE Carlo method , *BAYESIAN analysis , *STATISTICAL sampling , *STOCHASTIC processes - Abstract
In simulation studies for discriminant analysis, misclassification errors are often computed using the Monte Carlo method, by testing a classifier on large samples generated from known populations. Although large samples are expected to behave closely to the underlying distributions, they may not do so in a small interval or region, and thus may lead to unexpected results. We demonstrate with an example that the LDA misclassification error computed via the Monte Carlo method may often be smaller than the Bayes error. We give a rigorous explanation and recommend a method to properly compute misclassification errors. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
22. Bayesian Hierarchical Spatially Correlated Functional Data Analysis with Application to Colon Carcinogenesis.
- Author
-
Baladandayuthapani, Veerabhadran, Mallick, Bani K., Mee Young Hong, Lupton, Joanne R., Turner, Nancy D., and Carroll, Raymond J.
- Subjects
- *
BAYESIAN analysis , *STATISTICAL decision making , *DATA analysis , *DATA corruption , *COLON (Anatomy) - Abstract
In this article, we present new methods to analyze data from an experiment using rodent models to investigate the role of p27, an important cell-cycle mediator, in early colon carcinogenesis. The responses modeled here are essentially functions nested within a two-stage hierarchy. Standard functional data analysis literature focuses on a single stage of hierarchy and conditionally independent functions with near white noise. However, in our experiment, there is substantial biological motivation for the existence of spatial correlation among the functions, which arise from the locations of biological structures called colonic crypts: this possible functional correlation is a phenomenon we term crypt signaling. Thus, as a point of general methodology, we require an analysis that allows for functions to be correlated at the deepest level of the hierarchy. Our approach is fully Bayesian and uses Markov chain Monte Carlo methods for inference and estimation. Analysis of this data set gives new insights into the structure of p27 expression in early colon carcinogenesis and suggests the existence of significant crypt signaling. Our methodology uses regression splines, and because of the hierarchical nature of the data, dimension reduction of the covariance matrix of the spline coefficients is important: we suggest simple methods for overcoming this problem. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
23. Bayesian Curve Classification Using Wavelets.
- Author
-
Xiaohui Wang, Ray, Shubhankar, and Mallick, Bani K.
- Subjects
STATISTICS ,BAYESIAN analysis ,PROBABILITY theory ,CURVE fitting ,MARKOV processes ,MONTE Carlo method ,WAVELETS (Mathematics) - Abstract
We propose classification models for binary and multicategory data where the predictor is a random function. We use Bayesian modeling with wavelet basis functions that have nice approximation properties over a large class of functional spaces and can accommodate a wide variety of functional forms observed in real life applications. We develop an unified hierarchical model to encompass both the adaptive wavelet-based function estimation model and the logistic classification model. We couple together these two models are to borrow strengths from each other in a unified hierarchical framework. The use of Gibbs sampling with conjugate priors for posterior inference makes the method computationally feasible. We compare the performance of the proposed model with other classification methods, such as the existing naive plug-in methods, by analyzing simulated and real data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
24. Longitudinal Studies With Outcome-Dependent Follow-up: Models and Bayesian Regression.
- Author
-
Ryu, Duchwan, Sinha, Debajyoti, Mallick, Bani, Lipsitz, Stuart R., and Lipshultz, Steven E.
- Subjects
STATISTICS ,BAYESIAN analysis ,STATISTICAL decision making ,PROBABILITY theory ,MULTIVARIATE analysis ,REGRESSION analysis ,STATISTICAL smoothing ,SMOOTHING (Numerical analysis) - Abstract
We propose Bayesian parametric and semiparametric partially linear regression methods to analyze the outcome-dependent follow-up data when the random time of a follow-up measurement of an individual depends on the history of both observed longitudinal Outcomes and previous measurement times. We begin with the investigation of the simplifying assumptions of Lipsitz, Fitzmaurice, Ibrahim, Gelber, and Lipshultz, and present a new model for analyzing such data by allowing subject-specific correlations for the longitudinal response and by introducing a subject-specific latent variable to accommodate the association between the longitudinal measurements and the follow-up times. An extensive simulation study shows that our Bayesian partially linear regression method facilitates accurate estimation of the true regression line and the regression parameters. We illustrate our new methodology using data from a longitudinal observational study. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
25. Functional clustering by Bayesian wavelet methods.
- Author
-
Ray, Shubhankar and Mallick, Bani
- Subjects
CLUSTER theory (Nuclear physics) ,BAYESIAN analysis ,WAVELETS (Mathematics) ,DIRICHLET principle ,GIBBS' equation - Abstract
We propose a nonparametric Bayes wavelet model for clustering of functional data. The wavelet-based methodology is aimed at the resolution of generic global and local features during clustering and is suitable for clustering high dimensional data. Based on the Dirichlet process, the nonparametric Bayes model extends the scope of traditional Bayes wavelet methods to functional clustering and allows the elicitation of prior belief about the regularity of the functions and the number of clusters by suitably mixing the Dirichlet processes. Posterior inference is carried out by Gibbs sampling with conjugate priors, which makes the computation straightforward. We use simulated as well as real data sets to illustrate the suitability of the approach over other alternatives. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
26. Analyzing Nonstationary Spatial Data Using Piecewise Gaussian Processes.
- Author
-
Kim, Hyoung-Moon, Mallick, Bani K., and Holmes, C. C.
- Subjects
- *
GEOLOGICAL statistics , *MINE valuation -- Statistical methods , *GAUSSIAN processes , *DISTRIBUTION (Probability theory) , *BAYESIAN analysis , *KRIGING , *OIL fields - Abstract
In many problems in geostatistics the response variable of interest is strongly related to the underlying geology of the spatial location. In these situations there is often little correlation in the responses found in different rock strata, so the underlying covariance structure shows sharp changes at the boundaries of the rock types. Conventional stationary and nonstationary spatial methods are inappropriate, because they typically assume that the covariance between points is a smooth function of distance. In this article we propose a generic method for the analysis of spatial data with sharp changes in the underlying covariance structure. Our method works by automatically decomposing the spatial domain into disjoint regions within which the process is assumed to be stationary, but the data are assumed independent across regions. Uncertainty in the number of disjoint regions, their shapes, and the model within regions is dealt with in a fully Bayesian fashion. We illustrate our approach on a previously unpublished dataset relating to soil permeability of the Schneider Buda oil field in Wood County, Texas. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
27. Spatially Adaptive Bayesian Penalized Regression Splines (P-splines).
- Author
-
Baladandayuthapani, Veerabhadran, Mallick, Bani K., and Carroll, Raymond J.
- Subjects
- *
BAYESIAN analysis , *REGRESSION analysis , *MONTE Carlo method , *MARKOV processes , *PROBABILITY theory , *STATISTICAL correlation - Abstract
In this article we study penalized regression splines (P-splines), which are low-order basis splines with a penalty to avoid undersmoothing. Such P-splines are typically not spatially adaptive, and hence can have trouble when functions are varying rapidly. Our approach is to model the penalty parameter inherent in the P-spline method as a heteroscedastic regression function. We develop a full Bayesian hierarchical structure to do this and use Markov chain Monte Carlo techniques for drawing random samples from the posterior for inference. The advantage of using a Bayesian approach to P-splines is that it allows for simultaneous estimation of the smooth functions and the underlying penalty curve in addition to providing uncertainty intervals of the estimated curve. The Bayesian credible intervals obtained for the estimated curve are shown to have pointwise coverage probabilities close to nominal. The method is extended to additive models with simultaneous spline-based penalty functions for the unknown functions. In simulations, the approach achieves very competitive performance with the current best frequentist P-spline method in terms of frequentist mean squared error and coverage probabilities of the credible intervals, and performs better than some of the other Bayesian methods. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
28. Bayesian classification of tumours by using gene expression data.
- Author
-
Mallick, Bani K., Ghosh, Debashis, and Ghosh, Malay
- Subjects
BAYESIAN analysis ,STATISTICAL decision making ,TUMORS ,GENE expression ,MARKOV processes ,MONTE Carlo method - Abstract
Precise classification of tumours is critical for the diagnosis and treatment of cancer. Diagnostic pathology has traditionally relied on macroscopic and microscopic histology and tumour morphology as the basis for the classification of tumours. Current classification frameworks, however, cannot discriminate between tumours with similar histopathologic features, which vary in clinical course and in response to treatment. In recent years, there has been a move towards the use of complementary deoxyribonucleic acid microarrays for the classi-fication of tumours. These high throughput assays provide relative messenger ribonucleic acid expression measurements simultaneously for thousands of genes. A key statistical task is to perform classification via different expression patterns. Gene expression profiles may offer more information than classical morphology and may provide an alternative to classical tumour diagnosis schemes. The paper considers several Bayesian classification methods based on reproducing kernel Hilbert spaces for the analysis of microarray data. We consider the logistic likelihood as well as likelihoods related to support vector machine models. It is shown through simulation and examples that support vector machine models with multiple shrinkage parameters produce fewer misclassification errors than several existing classical methods as well as Bayesian methods based on the logistic likelihood or those involving only one shrinkage parameter. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
29. A Bayesian prediction using the skew Gaussian distribution
- Author
-
Kim, Hyoung-Moon and Mallick, Bani K.
- Subjects
- *
GAUSSIAN distribution , *MONTE Carlo method , *BAYESIAN analysis , *RAINFALL probabilities - Abstract
A model based on the skew Gaussian distribution is presented to handle skewed spatial data. It extends the results of popular Gaussian process models. Markov chain Monte Carlo techniques are used to generate samples from the posterior distributions of the parameters. Finally, this model is applied in the spatial prediction of weekly rainfall. Cross-validation shows that the predictive performance of our model compares favorably with several kriging variants. [Copyright &y& Elsevier]
- Published
- 2004
- Full Text
- View/download PDF
30. A Bayesian semiparametric transformation model incorporating frailties
- Author
-
Mallick, Bani K. and Walker, Stephen
- Subjects
- *
BAYESIAN analysis , *STATISTICS , *REGRESSION analysis - Abstract
We describe a Bayesian semiparametric (failure time) transformation model for which an unknown monotone transformation of failure times is assumed linearly dependent on observed covariates with an unspecified error distribution. The two unknowns: the monotone transformation and error distribution are assigned prior distributions with large supports. Our class of regression model includes the proportional hazards, accelerated failure time, and frailty models. Numerical examples are presented. [Copyright &y& Elsevier]
- Published
- 2003
- Full Text
- View/download PDF
31. Bayesian Wavelet Networks for Nonparametric Regression.
- Author
-
Holmes, Chris C. and Mallick, Bani K.
- Subjects
- *
BAYESIAN analysis , *ARTIFICIAL neural networks , *WAVELETS (Mathematics) , *NONPARAMETRIC statistics , *MARKOV processes - Abstract
Presents information on a study which analyzed the performance of proposed radial wavelet networks as methods for nonparametric regression. Bayesian approach to wavelet network construction; Description of the reversible jump Markov chain Monte Carlo algorithm; Test results; Conclusions.
- Published
- 2000
- Full Text
- View/download PDF
32. Combining information from several experiments with nonparametric priors.
- Author
-
MALLICK, BANI K. and WALKER, STEPHEN G.
- Subjects
- *
ARITHMETIC mean , *NONPARAMETRIC estimation , *MATHEMATICAL statistics , *BAYESIAN analysis , *STATISTICS - Abstract
This paper considers combining information from several experiments when the experiments can be summarised via a parameter value. The structure of this set of parameters, in terms of independence, exchangeability, partial exchangeability, etc., is assumed to be unknown and a finite number of possible structures are entertained, each with an associated prior weight representing the degree of belief in that structure. Crucial is the criterion by which these structures are selected. The final inference for the parameter values is taken to be the average, with respect to the posterior weights, of the values obtained from each structure. This is performed within a Bayesian nonparametric framework where the form of the prior distribution for the parameters is unrestricted. Therefore we do not assume that the distributions associated with a partial structure are from the same family. Different types of experiment suggest different types of distributions of parameters associated with each type of experiment. [ABSTRACT FROM AUTHOR]
- Published
- 1997
- Full Text
- View/download PDF
33. A note on the scale parameter of the dirichlet process.
- Author
-
Walker, Stephen G. and Mallick, Bani K.
- Subjects
- *
DIRICHLET forms , *DIRICHLET problem , *ESTIMATION theory , *STATISTICAL bootstrapping , *BAYESIAN analysis - Abstract
Cet article fournit un interprétation du paramètre d'échelle d'un processus Dirichlet lorsque le but est d'estimer une fonctionnelle linéaire d'une distribution de probabilité inconnue. Nous fournissons les premier et second moments a posteriori exacts pour de telles fonctionnelles sous les spécification a priori informatives et non-informatives. Le cas non-informatif nous donne une approximation normale du bootstrap bayesien. [ABSTRACT FROM AUTHOR]
- Published
- 1997
- Full Text
- View/download PDF
34. Modeling Expert Opinion Arising as a Partial Probabilistic Specification.
- Author
-
Gelfand, Alan E., Mallick, Bani K., and Dey, Dipak K.
- Subjects
- *
STATISTICAL decision making , *DECISION making , *DISTRIBUTION (Probability theory) , *RANDOM variables , *BAYESIAN analysis , *ESTIMATION theory - Abstract
Expert opinion is often sought with regard to unknowns in a decision-making setting. For a univariate unknown, θ, our presumption is that such opinion is elicited as a partial probabilistic specification in the form of either probability assignments regarding the chance of θ falling in a fixed set of disjoint exhaustive intervals or selected quantiles for θ. Treating such specification as "data," our focus is on the development of suitable probability densities for these data given the true θ. In particular, we advocate a rich class of densities created by transformation of random mixtures of beta distributions. These densities become likelihoods when viewed as a function of θ given the data. We presume that a decision-maker (here a so-called supra Bayesian) presides over the opinion collection, offering his or her assessment as well. All of this opinion is synthesized using Bayes's theorem, resulting in the posterior distribution as the pooling mechanism. The models are applied to opinion collected regarding points per game for participants in the 1991 National Basketball Association championship basketball series. [ABSTRACT FROM AUTHOR]
- Published
- 1995
- Full Text
- View/download PDF
35. Bayesian nonlinear regression for large small problems
- Author
-
Chakraborty, Sounak, Ghosh, Malay, and Mallick, Bani K.
- Subjects
- *
BAYESIAN analysis , *NONLINEAR theories , *REGRESSION analysis , *SUPPORT vector machines , *ALGORITHMS , *MULTIVARIATE analysis , *MARKOV processes , *MONTE Carlo method , *NEAR infrared spectroscopy - Abstract
Abstract: Statistical modeling and inference problems with sample sizes substantially smaller than the number of available covariates are challenging. This is known as large small problem. Furthermore, the problem is more complicated when we have multiple correlated responses. We develop multivariate nonlinear regression models in this setup for accurate prediction. In this paper, we introduce a full Bayesian support vector regression model with Vapnik’s -insensitive loss function, based on reproducing kernel Hilbert spaces (RKHS) under the multivariate correlated response setup. This provides a full probabilistic description of support vector machine (SVM) rather than an algorithm for fitting purposes. We have also introduced a multivariate version of the relevance vector machine (RVM). Instead of the original treatment of the RVM relying on the use of type II maximum likelihood estimates of the hyper-parameters, we put a prior on the hyper-parameters and use Markov chain Monte Carlo technique for computation. We have also proposed an empirical Bayes method for our RVM and SVM. Our methods are illustrated with a prediction problem in the near-infrared (NIR) spectroscopy. A simulation study is also undertaken to check the prediction accuracy of our models. [Copyright &y& Elsevier]
- Published
- 2012
- Full Text
- View/download PDF
36. Bayesian and variational Bayesian approaches for flows in heterogeneous random media.
- Author
-
Yang, Keren, Guha, Nilabja, Efendiev, Yalchin, and Mallick, Bani K.
- Subjects
- *
BAYESIAN analysis , *SIMULATION methods & models , *MARKOV chain Monte Carlo , *FINITE difference method , *MATHEMATICAL decomposition , *INFERENTIAL statistics - Abstract
In this paper, we study porous media flows in heterogeneous stochastic media. We propose an efficient forward simulation technique that is tailored for variational Bayesian inversion. As a starting point, the proposed forward simulation technique decomposes the solution into the sum of separable functions (with respect to randomness and the space), where each term is calculated based on a variational approach. This is similar to Proper Generalized Decomposition (PGD). Next, we apply a multiscale technique to solve for each term (as in [1] ) and, further, decompose the random function into 1D fields. As a result, our proposed method provides an approximation hierarchy for the solution as we increase the number of terms in the expansion and, also, increase the spatial resolution of each term. We use the hierarchical solution distributions in a variational Bayesian approximation to perform uncertainty quantification in the inverse problem. We conduct a detailed numerical study to explore the performance of the proposed uncertainty quantification technique and show the theoretical posterior concentration. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
37. A variational Bayesian approach for inverse problems with skew-t error distributions.
- Author
-
Guha, Nilabja, Wu, Xiaoqing, Efendiev, Yalchin, Jin, Bangti, and Mallick, Bani K.
- Subjects
- *
BAYESIAN analysis , *INVERSE problems , *SKEWNESS (Probability theory) , *DISTRIBUTION (Probability theory) , *MATHEMATICAL regularization , *ALGORITHMS - Abstract
In this work, we develop a novel robust Bayesian approach to inverse problems with data errors following a skew- t distribution. A hierarchical Bayesian model is developed in the inverse problem setup. The Bayesian approach contains a natural mechanism for regularization in the form of a prior distribution, and a LASSO type prior distribution is used to strongly induce sparseness. We propose a variational type algorithm by minimizing the Kullback–Leibler divergence between the true posterior distribution and a separable approximation. The proposed method is illustrated on several two-dimensional linear and nonlinear inverse problems, e.g. Cauchy problem and permeability estimation problem. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
38. Hyperspectral remote sensing of plant biochemistry using Bayesian model averaging with variable and band selection
- Author
-
Zhao, Kaiguang, Valle, Denis, Popescu, Sorin, Zhang, Xuesong, and Mallick, Bani
- Subjects
- *
HYPERSPECTRAL imaging systems , *REMOTE-sensing images , *BOTANICAL chemistry , *BAYESIAN analysis , *CHLOROPHYLL , *CAROTENOIDS , *LEAST squares , *MARKOV chain Monte Carlo - Abstract
Abstract: Model specification remains challenging in spectroscopy of plant biochemistry, as exemplified by the availability of various spectral indices or band combinations for estimating the same biochemical. This lack of consensus in model choice across applications argues for a paradigm shift in hyperspectral methods to address model uncertainty and misspecification. We demonstrated one such method using Bayesian model averaging (BMA), which performs variable/band selection and quantifies the relative merits of many candidate models to synthesize a weighted average model with improved predictive performances. The utility of BMA was examined using a portfolio of 27 foliage spectral–chemical datasets representing over 80 species across the globe to estimate multiple biochemical properties, including nitrogen, hydrogen, carbon, cellulose, lignin, chlorophyll (a or b), carotenoid, polar and nonpolar extractives, leaf mass per area, and equivalent water thickness. We also compared BMA with partial least squares (PLS) and stepwise multiple regression (SMR). Results showed that all the biochemicals except carotenoid were accurately estimated from hyerspectral data with R2 values>0.80. Compared to PLS and SMR, BMA substantially reduced overfitting and enhanced model generalization; BMA also yielded error estimation better indicative of true uncertainties in predictions, when evaluated using a statistic called “prediction interval coverage probability”. The relative band importance, which was quantified by band selection probability, differed markedly between BMA and SMR, cautioning the use of SMR for band selection. Computationally, the model calibration with datasets of moderate sizes (>100) was faster for BMA via a hybrid reversible-jump Monte Carlo Markov Chain sampler than for PLS via literal optimization of a cross-validation criterion. Our BMA scheme also provides a generic hierarchical Bayesian framework to assimilate prior knowledge of diverse forms, as illustrated by its use to account for nonlinearity in spectral–chemical relationships. We emphasize that BMA is a competitive, paradigm-shifting alternative to conventional statistical methods and it will find wide use as the virtue of Bayesian inference is increasingly appreciated by the remote sensing community. [Copyright &y& Elsevier]
- Published
- 2013
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.