Author: "Po-Ling Loh" / Topic: machine learning (stat.ml) - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Po-Ling Loh"' showing total 12 results

Start Over Author "Po-Ling Loh" Topic machine learning (stat.ml)

12 results on '"Po-Ling Loh"'

1. Extracting Robust and Accurate Features via a Robust Information Bottleneck

Author: Ankit Pensia, Varun Jog, and Po-Ling Loh
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial neural network, Computer science, Information Theory (cs.IT), Computer Science - Information Theory, Gaussian, Feature extraction, Supervised learning, Machine Learning (stat.ML), Information bottleneck method, Mutual information, Machine Learning (cs.LG), symbols.namesake, Stochastic gradient descent, Statistics - Machine Learning, symbols, Fisher information, Algorithm
Abstract: We propose a novel strategy for extracting features in supervised learning that can be used to construct a classifier which is more robust to small perturbations in the input space. Our method builds upon the idea of the information bottleneck by introducing an additional penalty term that encourages the Fisher information of the extracted features to be small, when parametrized by the inputs. By tuning the regularization parameter, we can explicitly trade off the opposing desiderata of robustness and accuracy when constructing a classifier. We derive the optimal solution to the robust information bottleneck when the inputs and outputs are jointly Gaussian, proving that the optimally robust features are also jointly Gaussian in that setting. Furthermore, we propose a method for optimizing a variational bound on the robust information bottleneck objective in general settings using stochastic gradient descent, which may be implemented efficiently in neural networks. Our experimental results for synthetic and real data sets show that the proposed feature extraction method indeed produces classifiers with increased robustness to perturbations., A version of this paper was submitted to IEEE Journal on Selected Areas in Information Theory (JSAIT)
Published: 2020
Full Text: View/download PDF

2. Robust W-GAN-Based Estimation Under Wasserstein Contamination

Author: Zheng Liu and Po-Ling Loh
Subjects: Statistics and Probability, FOS: Computer and information sciences, Numerical Analysis, Computer Science - Machine Learning, Applied Mathematics, Computer Science - Information Theory, Information Theory (cs.IT), Mathematics - Statistics Theory, Machine Learning (stat.ML), Statistics Theory (math.ST), Machine Learning (cs.LG), Computational Theory and Mathematics, Statistics - Machine Learning, FOS: Mathematics, Analysis
Abstract: Robust estimation is an important problem in statistics which aims at providing a reasonable estimator when the data-generating distribution lies within an appropriately defined ball around an uncontaminated distribution. Although minimax rates of estimation have been established in recent years, many existing robust estimators with provably optimal convergence rates are also computationally intractable. In this paper, we study several estimation problems under a Wasserstein contamination model and present computationally tractable estimators motivated by generative adversarial networks (GANs). Specifically, we analyze the properties of Wasserstein GAN-based estimators for location estimation, covariance matrix estimation and linear regression and show that our proposed estimators are minimax optimal in many scenarios. Finally, we present numerical results which demonstrate the effectiveness of our estimators.
Published: 2021
Full Text: View/download PDF

3. Provable Training Set Debugging for Linear Regression

Author: Xiaojin Zhu, Xiaomin Zhang, and Po-Ling Loh
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Theoretical computer science, Computer science, media_common.quotation_subject, Linear model, Computer Science::Software Engineering, Machine Learning (stat.ML), computer.software_genre, Machine Learning (cs.LG), Data set, Methodology (stat.ME), Lasso (statistics), Debugging, Artificial Intelligence, Statistics - Machine Learning, Linear regression, Integer programming, computer, Computer Science::Operating Systems, Software, Statistics - Methodology, Generator (mathematics), Debugger, media_common
Abstract: We investigate problems in penalized $M$-estimation, inspired by applications in machine learning debugging. Data are collected from two pools, one containing data with possibly contaminated labels, and the other which is known to contain only cleanly labeled points. We first formulate a general statistical algorithm for identifying buggy points and provide rigorous theoretical guarantees under the assumption that the data follow a linear model. We then present two case studies to illustrate the results of our general theory and the dependence of our estimator on clean versus buggy points. We further propose an algorithm for tuning parameter selection of our Lasso-based algorithm and provide corresponding theoretical guarantees. Finally, we consider a two-person "game" played between a bug generator and a debugger, where the debugger can augment the contaminated data set with cleanly labeled versions of points in the original data pool. We establish a theoretical result showing a sufficient condition under which the bug generator can always fool the debugger. Nonetheless, we provide empirical results showing that such a situation may not occur in practice, making it possible for natural augmentation strategies combined with our Lasso debugging algorithm to succeed.
Published: 2020

4. Confidence Sets for the Source of a Diffusion in Regular Trees

Author: Po-Ling Loh and Justin Khim
Subjects: FOS: Computer and information sciences, Discrete Mathematics (cs.DM), Computer Networks and Communications, Machine Learning (stat.ML), Mathematics - Statistics Theory, Statistics Theory (math.ST), 0102 computer and information sciences, 02 engineering and technology, Preferential attachment, 01 natural sciences, Combinatorics, Statistics - Machine Learning, FOS: Mathematics, 0202 electrical engineering, electronic engineering, information engineering, Probabilistic analysis of algorithms, Social and Information Networks (cs.SI), Degree (graph theory), Probability (math.PR), Probabilistic logic, Computer Science - Social and Information Networks, 020206 networking & telecommunications, Computer Science Applications, Tree (data structure), 010201 computation theory & mathematics, Control and Systems Engineering, Core (graph theory), 62M99, Graph (abstract data type), Node (circuits), Mathematics - Probability, Computer Science - Discrete Mathematics
Abstract: We study the problem of identifying the source of a diffusion spreading over a regular tree. When the degree of each node is at least three, we show that it is possible to construct confidence sets for the diffusion source with size independent of the number of infected nodes. Our estimators are motivated by analogous results in the literature concerning identification of the root node in preferential attachment and uniform attachment trees. At the core of our proofs is a probabilistic analysis of P\'{o}lya urns corresponding to the number of uninfected neighbors in specific subtrees of the infection tree. We also provide an example illustrating the shortcomings of source estimation techniques in settings where the underlying graph is asymmetric., Comment: 23 pages
Published: 2017
Full Text: View/download PDF

5. Adversarial Influence Maximization

Author: Justin Khim, Po-Ling Loh, and Varun Jog
Subjects: Social and Information Networks (cs.SI), FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science::Computer Science and Game Theory, Mathematical optimization, Computer science, Process (computing), Machine Learning (stat.ML), Computer Science - Social and Information Networks, 02 engineering and technology, Maximization, 010501 environmental sciences, Adversary, Minimax, 01 natural sciences, Upper and lower bounds, Machine Learning (cs.LG), Set (abstract data type), Adversarial system, Statistics - Machine Learning, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Repeated game, 0105 earth and related environmental sciences
Abstract: We consider the problem of influence maximization in fixed networks for contagion models in an adversarial setting. The goal is to select an optimal set of nodes to seed the influence process, such that the number of influenced nodes at the conclusion of the campaign is as large as possible. We formulate the problem as a repeated game between a player and adversary, where the adversary specifies the edges along which the contagion may spread, and the player chooses sets of nodes to influence in an online fashion. We establish upper and lower bounds on the minimax pseudo-regret in both undirected and directed networks., 30 pages
Published: 2019
Full Text: View/download PDF

6. Graph-Based Ascent Algorithms for Function Maximization

Author: Po-Ling Loh, Varun Jog, and Muni Sreenivas Pydi
Subjects: FOS: Computer and information sciences, Markov process, Machine Learning (stat.ML), 02 engineering and technology, 01 natural sciences, 010104 statistics & probability, Total variation, symbols.namesake, Statistics - Machine Learning, 0202 electrical engineering, electronic engineering, information engineering, FOS: Mathematics, 0101 mathematics, Mathematics - Optimization and Control, Connectivity, Mathematics, Social and Information Networks (cs.SI), Computer Science - Numerical Analysis, Computer Science - Social and Information Networks, 020206 networking & telecommunications, Maximization, Numerical Analysis (math.NA), Random walk, Optimization and Control (math.OC), symbols, Graph (abstract data type), Probability distribution, Laplacian matrix, Algorithm
Abstract: We study the problem of finding the maximum of a function defined on the nodes of a connected graph. The goal is to identify a node where the function obtains its maximum. We focus on local iterative algorithms, which traverse the nodes of the graph along a path, and the next iterate is chosen from the neighbors of the current iterate with probability distribution determined by the function values at the current iterate and its neighbors. We study two algorithms corresponding to a Metropolis-Hastings random walk with different transition kernels: (i) The first algorithm is an exponentially weighted random walk governed by a parameter $\gamma$. (ii) The second algorithm is defined with respect to the graph Laplacian and a smoothness parameter $k$. We derive convergence rates for the two algorithms in terms of total variation distance and hitting times. We also provide simulations showing the relative convergence rates of our algorithms in comparison to an unbiased random walk, as a function of the smoothness of the graph function. Our algorithms may be categorized as a new class of "descent-based" methods for function maximization on the nodes of a graph.
Published: 2018
Full Text: View/download PDF

7. Generalization Error Bounds for Noisy, Iterative Algorithms

Author: Varun Jog, Ankit Pensia, and Po-Ling Loh
Subjects: FOS: Computer and information sciences, Computer science, Computer Science - Information Theory, Markov process, Machine Learning (stat.ML), 02 engineering and technology, 010501 environmental sciences, Overfitting, 01 natural sciences, Machine Learning (cs.LG), Hybrid Monte Carlo, symbols.namesake, Statistics - Machine Learning, 0202 electrical engineering, electronic engineering, information engineering, Empirical risk minimization, Langevin dynamics, 0105 earth and related environmental sciences, Training set, Information Theory (cs.IT), Sampling (statistics), 020206 networking & telecommunications, Mutual information, Generalization error, Computer Science - Learning, Stochastic gradient descent, Iterated function, Statistical learning theory, Bounded function, symbols, Algorithm
Abstract: In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. Recent work [Xu and Raginsky (2017)] has established a bound on the generalization error of empirical risk minimization based on the mutual information $I(S;W)$ between the algorithm input $S$ and the algorithm output $W$, when the loss function is sub-Gaussian. We leverage these results to derive generalization error bounds for a broad class of iterative algorithms that are characterized by bounded, noisy updates with Markovian structure. Our bounds are very general and are applicable to numerous settings of interest, including stochastic gradient Langevin dynamics (SGLD) and variants of the stochastic gradient Hamiltonian Monte Carlo (SGHMC) algorithm. Furthermore, our error bounds hold for any output function computed over the path of iterates, including the last iterate of the algorithm or the average of subsets of iterates, and also allow for non-uniform sampling of data in successive updates of the algorithm., Comment: A shorter version of this paper was submitted to ISIT 2018. 14 pages, 1 figure
Published: 2018
Full Text: View/download PDF

8. Online learning with graph-structured feedback against adaptive adversaries

Author: Zhili Feng and Po-Ling Loh
Subjects: Discrete mathematics, FOS: Computer and information sciences, Computer Science - Information Theory, Online learning, Information Theory (cs.IT), Machine Learning (stat.ML), 0102 computer and information sciences, 010501 environmental sciences, 01 natural sciences, Upper and lower bounds, Tilde, Omega, Graph, Machine Learning (cs.LG), Computer Science - Learning, Statistics - Machine Learning, 010201 computation theory & mathematics, Bounded function, 0105 earth and related environmental sciences, Mathematics
Abstract: We derive upper and lower bounds for the policy regret of $T$-round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of $\widetilde O(T^{2/3})$ and $\widetilde O(T^{3/4})$ for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of $\widetilde\Omega(T^{2/3})$ is achieved in the case of full-information feedback. We also study the particular loss structure of an oblivious adversary with switching costs, and show that in such a setting, non-revealing strongly-observable feedback graphs achieve a lower bound of $\widetilde\Omega(T^{2/3})$, as well., Comment: This paper has been accepted to ISIT 2018
Published: 2018
Full Text: View/download PDF

9. Statistical consistency and asymptotic normality for high-dimensional robust $M$-estimators

Author: Po-Ling Loh
Subjects: FOS: Computer and information sciences, Statistics and Probability, Computer Science - Information Theory, nonconvex optimization, Asymptotic distribution, Machine Learning (stat.ML), Mathematics - Statistics Theory, Statistics Theory (math.ST), 02 engineering and technology, 01 natural sciences, Robust regression, 010104 statistics & probability, Lasso (statistics), Statistics - Machine Learning, high-dimensional statistics, FOS: Mathematics, 0202 electrical engineering, electronic engineering, information engineering, Applied mathematics, 0101 mathematics, support recovery, Mathematics, Information Theory (cs.IT), Linear model, Estimator, 020206 networking & telecommunications, Minimax, Stationary point, statistical consistency, Statistics, Probability and Uncertainty, Gradient descent, 62F12
Abstract: We study theoretical properties of regularized robust M-estimators, applicable when data are drawn from a sparse high-dimensional linear model and contaminated by heavy-tailed distributions and/or outliers in the additive errors and covariates. We first establish a form of local statistical consistency for the penalized regression estimators under fairly mild conditions on the error distribution: When the derivative of the loss function is bounded and satisfies a local restricted curvature condition, all stationary points within a constant radius of the true regression vector converge at the minimax rate enjoyed by the Lasso with sub-Gaussian errors. When an appropriate nonconvex regularizer is used in place of an l_1-penalty, we show that such stationary points are in fact unique and equal to the local oracle solution with the correct support---hence, results on asymptotic normality in the low-dimensional case carry over immediately to the high-dimensional setting. This has important implications for the efficiency of regularized nonconvex M-estimators when the errors are heavy-tailed. Our analysis of the local curvature of the loss function also has useful consequences for optimization when the robust regression function and/or regularizer is nonconvex and the objective function possesses stationary points outside the local region. We show that as long as a composite gradient descent algorithm is initialized within a constant radius of the true regression vector, successive iterates will converge at a linear rate to a stationary point within the local region. Furthermore, the global optimum of a convex regularized robust regression function may be used to obtain a suitable initialization. The result is a novel two-step procedure that uses a convex M-estimator to achieve consistency and a nonconvex M-estimator to increase efficiency., 56 pages, 8 figures
Published: 2017

10. On model misspecification and KL separation for Gaussian graphical models

Author: Varun Jog and Po-Ling Loh
Subjects: FOS: Computer and information sciences, Kullback–Leibler divergence, 62B10, Estimation theory, Gaussian, Computer Science - Information Theory, Information Theory (cs.IT), Multivariate normal distribution, Hamming distance, Mathematics - Statistics Theory, Machine Learning (stat.ML), Statistics Theory (math.ST), Upper and lower bounds, symbols.namesake, Statistics - Machine Learning, Bounded function, symbols, FOS: Mathematics, Applied mathematics, Graphical model, Mathematics
Abstract: We establish bounds on the KL divergence between two multivariate Gaussian distributions in terms of the Hamming distance between the edge sets of the corresponding graphical models. We show that the KL divergence is bounded below by a constant when the graphs differ by at least one edge; this is essentially the tightest possible bound, since classes of graphs exist for which the edge discrepancy increases but the KL divergence remains bounded above by a constant. As a natural corollary to our KL lower bound, we also establish a sample size requirement for correct model selection via maximum likelihood estimation. Our results rigorize the notion that it is essential to estimate the edge structure of a Gaussian graphical model accurately in order to approximate the true distribution to close precision., Comment: Accepted to ISIT 2015
Published: 2015
Full Text: View/download PDF

11. Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

Author: Martin J. Wainwright and Po-Ling Loh
Subjects: FOS: Computer and information sciences, Statistics and Probability, model selection, Mathematics - Statistics Theory, Machine Learning (stat.ML), Multivariate normal distribution, Statistics Theory (math.ST), Markov random fields, Exponential family, Statistics - Machine Learning, high-dimensional statistics, FOS: Mathematics, Applied mathematics, Graphical model, Mathematics, Covariance matrix, inverse covariance estimation, Covariance, Tree (graph theory), 68W25, exponential families, Conditional independence, Graph (abstract data type), Graphical models, Statistics, Probability and Uncertainty, Legendre duality, 62F12
Abstract: We investigate the relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a non-Gaussian distribution. The proof exploits a combination of ideas from the geometry of exponential families, junction tree theory and convex analysis. These population-level results have various consequences for graph selection methods, both known and novel, including a novel method for structure estimation for missing or corrupted observations. We provide nonasymptotic guarantees for such methods and illustrate the sharpness of these predictions via simulations., Comment: Published in at http://dx.doi.org/10.1214/13-AOS1162 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Published: 2013
Full Text: View/download PDF

12. High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity

Author: Martin J. Wainwright and Po-Ling Loh
Subjects: Statistics and Probability, FOS: Computer and information sciences, Optimization problem, Computer Science - Information Theory, Context (language use), Machine Learning (stat.ML), Mathematics - Statistics Theory, Statistics Theory (math.ST), nonconvexity, missing data, $M$-estimation, Statistics - Machine Learning, sparse linear regression, Expectation–maximization algorithm, FOS: Mathematics, Time complexity, Mathematics, Information Theory (cs.IT), Estimator, High-dimensional statistics, Missing data, 68W25, regularization, Statistics, Probability and Uncertainty, Gradient descent, 62F12, Algorithm
Abstract: Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependence, as well. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently nonconvex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing nonconvex programs, we are able to both analyze the statistical error associated with any global optimum, and more surprisingly, to prove that a simple algorithm based on projected gradient descent will converge in polynomial time to a small neighborhood of the set of all global minimizers. On the statistical side, we provide nonasymptotic bounds that hold with high probability for the cases of noisy, missing and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm is guaranteed to converge at a geometric rate to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing close agreement with the predicted scalings., Published in at http://dx.doi.org/10.1214/12-AOS1018 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Published: 2011

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

12 results on '"Po-Ling Loh"'

1. Extracting Robust and Accurate Features via a Robust Information Bottleneck

2. Robust W-GAN-Based Estimation Under Wasserstein Contamination

3. Provable Training Set Debugging for Linear Regression

4. Confidence Sets for the Source of a Diffusion in Regular Trees

5. Adversarial Influence Maximization

6. Graph-Based Ascent Algorithms for Function Maximization

7. Generalization Error Bounds for Noisy, Iterative Algorithms

8. Online learning with graph-structured feedback against adaptive adversaries

9. Statistical consistency and asymptotic normality for high-dimensional robust $M$-estimators

10. On model misspecification and KL separation for Gaussian graphical models

11. Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

12. High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

12 results on '"Po-Ling Loh"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources