Author: "Anru Zhang" / Topic: applied mathematics - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Anru Zhang"' showing total 11 results

Start Over Author "Anru Zhang" Topic applied mathematics

11 results on '"Anru Zhang"'

1. High-dimensional log-error-in-variable regression with applications to microbial compositional data analysis

Author: Anru Zhang, Yuchen Zhou, and Pixu Shi
Subjects: FOS: Computer and information sciences, Statistics and Probability, Matching (statistics), Applied Mathematics, General Mathematics, Mathematics - Statistics Theory, Regression analysis, Statistics Theory (math.ST), Statistics - Applications, Quantitative Biology::Genomics, Agricultural and Biological Sciences (miscellaneous), Regression, Methodology (stat.ME), Overdispersion, Statistics, Covariate, FOS: Mathematics, Applications (stat.AP), Imputation (statistics), Statistics, Probability and Uncertainty, General Agricultural and Biological Sciences, Compositional data, Statistics - Methodology, Randomness, Mathematics
Abstract: Summary In microbiome and genomic studies, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth, the classic log-contrast model is often used where read counts are normalized into compositions. However, zero read counts and the randomness in covariates remain critical issues. We introduce a surprisingly simple, interpretable and efficient method for the estimation of compositional data regression through the lens of a novel high-dimensional log-error-in-variable regression model. The proposed method provides corrections on sequencing data with possible overdispersion and simultaneously avoids any subjective imputation of zero read counts. We provide theoretical justifications with matching upper and lower bounds for the estimation error. The merit of the procedure is illustrated through real data analysis and simulation studies.
Published: 2021
Full Text: View/download PDF

2. Multisample estimation of bacterial composition matrices in metagenomics data

Author: Hongzhe Li, Anru Zhang, and Yuanpei Cao
Subjects: 0301 basic medicine, Statistics and Probability, Applied Mathematics, General Mathematics, Computational biology, Bacterial composition, 01 natural sciences, Agricultural and Biological Sciences (miscellaneous), 010104 statistics & probability, 03 medical and health sciences, 030104 developmental biology, Metagenomics, 0101 mathematics, Statistics, Probability and Uncertainty, General Agricultural and Biological Sciences, Mathematics
Abstract: Summary Metagenomics sequencing is routinely applied to quantify bacterial abundances in microbiome studies, where bacterial composition is estimated based on the sequencing read counts. Due to limited sequencing depth and DNA dropouts, many rare bacterial taxa might not be captured in the final sequencing reads, which results in many zero counts. Naive composition estimation using count normalization leads to many zero proportions, which tend to result in inaccurate estimates of bacterial abundance and diversity. This paper takes a multisample approach to estimation of bacterial abundances in order to borrow information across samples and across species. Empirical results from real datasets suggest that the composition matrix over multiple samples is approximately low rank, which motivates a regularized maximum likelihood estimation with a nuclear norm penalty. An efficient optimization algorithm using the generalized accelerated proximal gradient and Euclidean projection onto simplex space is developed. Theoretical upper bounds and the minimax lower bounds of the estimation errors, measured by the Kullback–Leibler divergence and the Frobenius norm, are established. Simulation studies demonstrate that the proposed estimator outperforms the naive estimators. The method is applied to an analysis of a human gut microbiome dataset.
Published: 2019
Full Text: View/download PDF

3. Optimal Sparse Singular Value Decomposition for High-Dimensional High-Order Data

Author: Rungang Han and Anru Zhang
Subjects: FOS: Computer and information sciences, Statistics and Probability, Dimensionality reduction, 05 social sciences, Structure (category theory), Machine Learning (stat.ML), Mathematics - Statistics Theory, Statistics Theory (math.ST), High dimensional, 01 natural sciences, Article, Methodology (stat.ME), 010104 statistics & probability, Statistics - Machine Learning, Tensor (intrinsic definition), 0502 economics and business, Singular value decomposition, FOS: Mathematics, Applied mathematics, 0101 mathematics, Statistics, Probability and Uncertainty, High order, Statistics - Methodology, 050205 econometrics, Mathematics
Abstract: In this article, we consider the sparse tensor singular value decomposition, which aims for dimension reduction on high-dimensional high-order data with certain sparsity structure. A method named Sparse Tensor Alternating Thresholding for Singular Value Decomposition (STAT-SVD) is proposed. The proposed procedure features a novel double projection \& thresholding scheme, which provides a sharp criterion for thresholding in each iteration. Compared with regular tensor SVD model, STAT-SVD permits more robust estimation under weaker assumptions. Both the upper and lower bounds for estimation accuracy are developed. The proposed procedure is shown to be minimax rate-optimal in a general class of situations. Simulation studies show that STAT-SVD performs well under a variety of configurations. We also illustrate the merits of the proposed procedure on a longitudinal tensor dataset on European country mortality rates., 73 pages
Published: 2019
Full Text: View/download PDF

4. Sparse and Low-rank Tensor Estimation via Cubic Sketchings

Author: Botao Hao, Anru Zhang, and Guang Cheng
Subjects: FOS: Computer and information sciences, Noise measurement, Rank (linear algebra), MathematicsofComputing_NUMERICALANALYSIS, Machine Learning (stat.ML), Mathematics - Statistics Theory, 020206 networking & telecommunications, Statistics Theory (math.ST), 02 engineering and technology, Library and Information Sciences, Article, Computer Science Applications, Matrix decomposition, Statistics - Machine Learning, Linear regression, FOS: Mathematics, 0202 electrical engineering, electronic engineering, information engineering, Applied mathematics, Tensor decomposition, Tensor, Gradient descent, Information Systems, Mathematics, Sparse matrix
Abstract: In this paper, we propose a general framework for sparse and low-rank tensor estimation from cubic sketchings. A two-stage non-convex implementation is developed based on sparse tensor decomposition and thresholded gradient descent, which ensures exact recovery in the noiseless case and stable recovery in the noisy case with high probability. The non-asymptotic analysis sheds light on an interplay between optimization error and statistical error. The proposed procedure is shown to be rate-optimal under certain conditions. As a technical by-product, novel high-order concentration inequalities are derived for studying high-moment sub-Gaussian tensors. An interesting tensor formulation illustrates the potential application to high-order interaction pursuit in high-dimensional linear regression., Accepted at IEEE Transactions on Information Theory
Published: 2021

5. Spectral State Compression of Markov Processes

Author: Anru Zhang and Mengdi Wang
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Rank (linear algebra), Computer science, Maximum likelihood, Lumpability, Markov process, Machine Learning (stat.ML), 02 engineering and technology, Library and Information Sciences, Markov model, Upper and lower bounds, Article, Matrix decomposition, Machine Learning (cs.LG), Methodology (stat.ME), symbols.namesake, Statistics - Machine Learning, 0202 electrical engineering, electronic engineering, information engineering, Applied mathematics, State space, Statistics - Methodology, Markov chain, Stochastic matrix, 020206 networking & telecommunications, Minimax, Computer Science Applications, symbols, Information Systems
Abstract: Model reduction of Markov processes is a basic problem in modeling state-transition systems. Motivated by the state aggregation approach rooted in control theory, we study the statistical state compression of a discrete-state Markov chain from empirical trajectories. Through the lens of spectral decomposition, we study the rank and features of Markov processes, as well as properties like representability, aggregability, and lumpability. We develop spectral methods for estimating the transition matrix of a low-rank Markov model, estimating the leading subspace spanned by Markov features, and recovering latent structures like state aggregation and lumpable partition of the state space. We prove statistical upper bounds for the estimation errors and nearly matching minimax lower bounds. Numerical studies are performed on synthetic data and a dataset of New York City taxi trips., Comment: to appear in IEEE Transactions on Information Theory
Published: 2021

6. Sequential rerandomization

Author: Quan Zhou, Philip A Ernst, Kari Lock Morgan, Donald B Rubin, and Anru Zhang
Subjects: FOS: Computer and information sciences, Statistics and Probability, Applied Mathematics, General Mathematics, Miscellanea, Statistics - Applications, 01 natural sciences, Agricultural and Biological Sciences (miscellaneous), 010104 statistics & probability, 03 medical and health sciences, 0302 clinical medicine, 62K99, Applications (stat.AP), 0101 mathematics, Statistics, Probability and Uncertainty, General Agricultural and Biological Sciences, 030217 neurology & neurosurgery
Abstract: The seminal work of Morgan and Rubin (2012) considers rerandomization for all the units at one time. In practice, however, experimenters may have to rerandomize units sequentially. For example, a clinician studying a rare disease may be unable to wait to perform an experiment until all the experimental units are recruited. Our work offers a mathematical framework for sequential rerandomization designs, where the experimental units are enrolled in groups. We formulate an adaptive rerandomization procedure for balancing treatment/control assignments over some continuous or binary covariates, using Mahalanobis distance as the imbalance measure. We prove in our key result, Theorem 3, that given the same number of rerandomizations (in expected value), under certain mild assumptions, sequential rerandomization achieves better covariate balance than rerandomization at one time., 23 pages
Published: 2018
Full Text: View/download PDF

7. On the non‐asymptotic and sharp lower tail bounds of random variables

Author: Anru Zhang and Yuchen Zhou
Subjects: Statistics and Probability, symbols.namesake, Matching (graph theory), Binomial (polynomial), symbols, Applied mathematics, Statistics, Probability and Uncertainty, Concentration inequality, Poisson distribution, Extreme value theory, Upper and lower bounds, Random variable, Mathematics
Abstract: The non-asymptotic tail bounds of random variables play crucial roles in probability, statistics, and machine learning. Despite much success in developing upper bounds on tail probability in literature, the lower bounds on tail probabilities are relatively fewer. In this paper, we introduce systematic and user-friendly schemes for developing non-asymptotic lower bounds of tail probabilities. In addition, we develop sharp lower tail bounds for the sum of independent sub-Gaussian and sub-exponential random variables, which match the classic Hoeffding-type and Bernstein-type concentration inequalities, respectively. We also provide non-asymptotic matching upper and lower tail bounds for a suite of distributions, including gamma, beta, (regular, weighted, and noncentral) chi-square, binomial, Poisson, Irwin-Hall, etc. We apply the result to establish the matching upper and lower bounds for extreme value expectation of the sum of independent sub-Gaussian and sub-exponential random variables. A statistical application of signal identification from sparse heterogeneous mixtures is finally considered.
Published: 2020
Full Text: View/download PDF

8. ISLET: Fast and Optimal Low-rank Tensor Regression via Importance Sketching

Author: Garvesh Raskutti, Anru Zhang, Yuetian Luo, and Ming Yuan
Subjects: FOS: Computer and information sciences, endocrine system, Computer Science - Machine Learning, endocrine system diseases, Rank (linear algebra), Machine Learning (stat.ML), Mathematics - Statistics Theory, Statistics Theory (math.ST), behavioral disciplines and activities, Quantitative Biology::Cell Behavior, Machine Learning (cs.LG), Methodology (stat.ME), Statistics - Machine Learning, FOS: Mathematics, Applied mathematics, Tensor, Mathematics - Numerical Analysis, Statistics - Methodology, Mathematics, geography, geography.geographical_feature_category, Quantitative Biology::Neurons and Cognition, Dimensionality reduction, Numerical Analysis (math.NA), Islet, humanities, Regression, nervous system
Abstract: In this paper, we develop a novel procedure for low-rank tensor regression, namely \emph{\underline{I}mportance \underline{S}ketching \underline{L}ow-rank \underline{E}stimation for \underline{T}ensors} (ISLET). The central idea behind ISLET is \emph{importance sketching}, i.e., carefully designed sketches based on both the responses and low-dimensional structure of the parameter of interest. We show that the proposed method is sharply minimax optimal in terms of the mean-squared error under low-rank Tucker assumptions and under randomized Gaussian ensemble design. In addition, if a tensor is low-rank with group sparsity, our procedure also achieves minimax optimality. Further, we show through numerical study that ISLET achieves comparable or better mean-squared error performance to existing state-of-the-art methods while having substantial storage and run-time advantages including capabilities for parallel and distributed computing. In particular, our procedure performs reliable estimation with tensors of dimension $p = O(10^8)$ and is $1$ or $2$ orders of magnitude faster than baseline methods.
Published: 2019

9. Semi-supervised inference: General theory and estimation of means

Author: T. Tony Cai, Anru Zhang, and Lawrence D. Brown
Subjects: FOS: Computer and information sciences, Statistics and Probability, Asymptotic distribution, Inference, semi-supervised inference, Mathematics - Statistics Theory, Machine Learning (stat.ML), Sample (statistics), Statistics Theory (math.ST), 01 natural sciences, Methodology (stat.ME), 010104 statistics & probability, 62J05, 62G08, Statistics - Machine Learning, Covariate, FOS: Mathematics, Applied mathematics, 0101 mathematics, Finite set, Statistics - Methodology, Mathematics, Confidence interval, Nonparametric statistics, Estimator, estimation of mean, efficiency, Statistics, Probability and Uncertainty, 62F12, limiting distribution, 62F10
Abstract: We propose a general semi-supervised inference framework focused on the estimation of the population mean. As usual in semi-supervised settings, there exists an unlabeled sample of covariate vectors and a labeled sample consisting of covariate vectors along with real-valued responses (“labels”). Otherwise, the formulation is “assumption-lean” in that no major conditions are imposed on the statistical or functional form of the data. We consider both the ideal semi-supervised setting where infinitely many unlabeled samples are available, as well as the ordinary semi-supervised setting in which only a finite number of unlabeled samples is available. ¶ Estimators are proposed along with corresponding confidence intervals for the population mean. Theoretical analysis on both the asymptotic distribution and $\ell_{2}$-risk for the proposed procedures are given. Surprisingly, the proposed estimators, based on a simple form of the least squares method, outperform the ordinary sample mean. The simple, transparent form of the estimator lends confidence to the perception that its asymptotic improvement over the ordinary sample mean also nearly holds even for moderate size samples. The method is further extended to a nonparametric setting, in which the oracle rate can be achieved asymptotically. The proposed estimators are further illustrated by simulation studies and a real data example involving estimation of the homeless population.
Published: 2019
Full Text: View/download PDF

10. Sharp RIP bound for sparse signal and low-rank matrix recovery

Author: T. Tony Cai and Anru Zhang
Subjects: Rank (linear algebra), Applied Mathematics, 020206 networking & telecommunications, Low-rank approximation, 010103 numerical & computational mathematics, 02 engineering and technology, 01 natural sciences, Signal, Restricted isometry property, Combinatorics, Linear map, Matrix (mathematics), Compressed sensing, 0202 electrical engineering, electronic engineering, information engineering, Minification, 0101 mathematics, Mathematics
Abstract: This paper establishes a sharp condition on the restricted isometry property (RIP) for both the sparse signal recovery and low-rank matrix recovery. It is shown that if the measurement matrix A satisfies the RIP condition δ k A 1 / 3 , then all k -sparse signals β can be recovered exactly via the constrained l 1 minimization based on y = A β . Similarly, if the linear map M satisfies the RIP condition δ r M 1 / 3 , then all matrices X of rank at most r can be recovered exactly via the constrained nuclear norm minimization based on b = M ( X ) . Furthermore, in both cases it is not possible to do so in general when the condition does not hold. In addition, noisy cases are considered and oracle inequalities are given under the sharp RIP condition.
Published: 2013
Full Text: View/download PDF

11. Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data

Author: Anru Zhang and T. Tony Cai
Subjects: 0301 basic medicine, Statistics and Probability, FOS: Computer and information sciences, MathematicsofComputing_NUMERICALANALYSIS, Matrix norm, Mathematics - Statistics Theory, Statistics Theory (math.ST), 01 natural sciences, Article, Methodology (stat.ME), 010104 statistics & probability, 03 medical and health sciences, Statistical inference, Range (statistics), FOS: Mathematics, Statistics::Methodology, Applied mathematics, 0101 mathematics, Statistics - Methodology, Mathematics, Numerical Analysis, Optimal estimation, Estimator, Covariance, Minimax, Missing data, 030104 developmental biology, Statistics, Probability and Uncertainty
Abstract: Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data.
Published: 2016
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

11 results on '"Anru Zhang"'

1. High-dimensional log-error-in-variable regression with applications to microbial compositional data analysis

2. Multisample estimation of bacterial composition matrices in metagenomics data

3. Optimal Sparse Singular Value Decomposition for High-Dimensional High-Order Data

4. Sparse and Low-rank Tensor Estimation via Cubic Sketchings

5. Spectral State Compression of Markov Processes

6. Sequential rerandomization

7. On the non‐asymptotic and sharp lower tail bounds of random variables

8. ISLET: Fast and Optimal Low-rank Tensor Regression via Importance Sketching

9. Semi-supervised inference: General theory and estimation of means

10. Sharp RIP bound for sparse signal and low-rank matrix recovery

11. Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

11 results on '"Anru Zhang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources