Author: "Richtarik, Peter" / Database: OAIster - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Richtarik, Peter"' showing total 22 results

Start Over Author "Richtarik, Peter" Database OAIster

22 results on '"Richtarik, Peter"'

1. MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

Author: Modoranu, Ionut-Vlad, Safaryan, Mher, Malinovsky, Grigory, Kurtic, Eldar, Robert, Thomas, Richtarik, Peter, Alistarh, Dan, Modoranu, Ionut-Vlad, Safaryan, Mher, Malinovsky, Grigory, Kurtic, Eldar, Robert, Thomas, Richtarik, Peter, and Alistarh, Dan
Abstract: We propose a new variant of the Adam optimizer [Kingma and Ba, 2014] called MICROADAM that specifically minimizes memory overheads, while maintaining theoretical convergence guarantees. We achieve this by compressing the gradient information before it is fed into the optimizer state, thereby reducing its memory footprint significantly. We control the resulting compression error via a novel instance of the classical error feedback mechanism from distributed optimization [Seide et al., 2014, Alistarh et al., 2018, Karimireddy et al., 2019] in which the error correction information is itself compressed to allow for practical memory gains. We prove that the resulting approach maintains theoretical convergence guarantees competitive to those of AMSGrad, while providing good practical performance. Specifically, we show that MICROADAM can be implemented efficiently on GPUs: on both million-scale (BERT) and billion-scale (LLaMA) models, MicroAdam provides practical convergence competitive to that of the uncompressed Adam baseline, with lower memory usage and similar running time. Our code is available at https://github.com/IST-DASLab/MicroAdam.
Published: 2024

2. PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

Author: Malinovskii, Vladimir, Mazur, Denis, Ilin, Ivan, Kuznedelev, Denis, Burlachenko, Konstantin, Yi, Kai, Alistarh, Dan, Richtarik, Peter, Malinovskii, Vladimir, Mazur, Denis, Ilin, Ivan, Kuznedelev, Denis, Burlachenko, Konstantin, Yi, Kai, Alistarh, Dan, and Richtarik, Peter
Abstract: There has been significant interest in "extreme" compression of large language models (LLMs), i.e., to 1-2 bits per parameter, which allows such models to be executed efficiently on resource-constrained devices. Existing work focused on improved one-shot quantization techniques and weight representations; yet, purely post-training approaches are reaching diminishing returns in terms of the accuracy-vs-bit-width trade-off. State-of-the-art quantization methods such as QuIP# and AQLM include fine-tuning (part of) the compressed parameters over a limited amount of calibration data; however, such fine-tuning techniques over compressed weights often make exclusive use of straight-through estimators (STE), whose performance is not well-understood in this setting. In this work, we question the use of STE for extreme LLM compression, showing that it can be sub-optimal, and perform a systematic study of quantization-aware fine-tuning strategies for LLMs. We propose PV-Tuning - a representation-agnostic framework that generalizes and improves upon existing fine-tuning strategies, and provides convergence guarantees in restricted cases. On the practical side, when used for 1-2 bit vector quantization, PV-Tuning outperforms prior techniques for highly-performant models such as Llama and Mistral. Using PV-Tuning, we achieve the first Pareto-optimal quantization for Llama 2 family models at 2 bits per parameter., Comment: Preprint
Published: 2024

3. Federated Learning is Better with Non-Homomorphic Encryption

Author: Burlachenko, Konstantin, Alrowithi, Abdulmajeed, Albalawi, Fahad Ali, Richtarik, Peter, Burlachenko, Konstantin, Alrowithi, Abdulmajeed, Albalawi, Fahad Ali, and Richtarik, Peter
Abstract: Traditional AI methodologies necessitate centralized data collection, which becomes impractical when facing problems with network communication, data privacy, or storage capacity. Federated Learning (FL) offers a paradigm that empowers distributed AI model training without collecting raw data. There are different choices for providing privacy during FL training. One of the popular methodologies is employing Homomorphic Encryption (HE) - a breakthrough in privacy-preserving computation from Cryptography. However, these methods have a price in the form of extra computation and memory footprint. To resolve these issues, we propose an innovative framework that synergizes permutation-based compressors with Classical Cryptography, even though employing Classical Cryptography was assumed to be impossible in the past in the context of FL. Our framework offers a way to replace HE with cheaper Classical Cryptography primitives which provides security for the training process. It fosters asynchronous communication and provides flexible deployment options in various communication topologies., Comment: 56 pages, 10 figures, Accepted to presentation and proceedings to 4th ACM International Workshop on Distributed Machine Learning
Published: 2023
Full Text: View/download PDF

4. Adaptive Compression for Communication-Efficient Distributed Training

Author: Makarenko, Maksim, Gasanov, Elnur, Islamov, Rustem, Sadiev, Abdurakhmon, Richtarik, Peter, Makarenko, Maksim, Gasanov, Elnur, Islamov, Rustem, Sadiev, Abdurakhmon, and Richtarik, Peter
Abstract: We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel optimization algorithm for communication-efficient training of supervised machine learning models with adaptive compression level. Our approach is inspired by the recently proposed three point compressor (3PC) framework of Richtarik et al. (2022), which includes error feedback (EF21), lazily aggregated gradient (LAG), and their combination as special cases, and offers the current state-of-the-art rates for these methods under weak assumptions. While the above mechanisms offer a fixed compression level, or adapt between two extremes only, our proposal is to perform a much finer adaptation. In particular, we allow the user to choose any number of arbitrarily chosen contractive compression mechanisms, such as Top-K sparsification with a user-defined selection of sparsification levels K, or quantization with a user-defined selection of quantization levels, or their combination. AdaCGD chooses the appropriate compressor and compression level adaptively during the optimization process. Besides i) proposing a theoretically-grounded multi-adaptive communication compression mechanism, we further ii) extend the 3PC framework to bidirectional compression, i.e., we allow the server to compress as well, and iii) provide sharp convergence bounds in the strongly convex, convex and nonconvex settings. The convex regime results are new even for several key special cases of our general mechanism, including 3PC and EF21. In all regimes, our rates are superior compared to all existing adaptive compression methods.
Published: 2022

5. Convergence of Stein Variational Gradient Descent under a Weaker Smoothness Condition

Author: Sun, Lukang, Karagulyan, Avetik, Richtarik, Peter, Sun, Lukang, Karagulyan, Avetik, and Richtarik, Peter
Abstract: Stein Variational Gradient Descent (SVGD) is an important alternative to the Langevin-type algorithms for sampling from probability distributions of the form $\pi(x) \propto \exp(-V(x))$. In the existing theory of Langevin-type algorithms and SVGD, the potential function $V$ is often assumed to be $L$-smooth. However, this restrictive condition excludes a large class of potential functions such as polynomials of degree greater than $2$. Our paper studies the convergence of the SVGD algorithm for distributions with $(L_0,L_1)$-smooth potentials. This relaxed smoothness assumption was introduced by Zhang et al. [2019a] for the analysis of gradient clipping algorithms. With the help of trajectory-independent auxiliary conditions, we provide a descent lemma establishing that the algorithm decreases the $\mathrm{KL}$ divergence at each iteration and prove a complexity bound for SVGD in the population limit in terms of the Stein Fisher information.
Published: 2022

6. A Field Guide to Federated Optimization

Author: Wang, Jianyu, Charles, Zachary, Xu, Zheng, Joshi, Gauri, McMahan, H. Brendan, Arcas, Blaise Aguera y, Al-Shedivat, Maruan, Andrew, Galen, Avestimehr, Salman, Daly, Katharine, Data, Deepesh, Diggavi, Suhas, Eichner, Hubert, Gadhikar, Advait, Garrett, Zachary, Girgis, Antonious M., Hanzely, Filip, Hard, Andrew, He, Chaoyang, Horvath, Samuel, Huo, Zhouyuan, Ingerman, Alex, Jaggi, Martin, Javidi, Tara, Kairouz, Peter, Kale, Satyen, Karimireddy, Sai Praneeth, Konecny, Jakub, Koyejo, Sanmi, Li, Tian, Liu, Luyang, Mohri, Mehryar, Qi, Hang, Reddi, Sashank J., Richtarik, Peter, Singhal, Karan, Smith, Virginia, Soltanolkotabi, Mahdi, Song, Weikang, Suresh, Ananda Theertha, Stich, Sebastian U., Talwalkar, Ameet, Wang, Hongyi, Woodworth, Blake, Wu, Shanshan, Yu, Felix X., Yuan, Honglin, Zaheer, Manzil, Zhang, Mi, Zhang, Tong, Zheng, Chunxiang, Zhu, Chen, Zhu, Wennan, Wang, Jianyu, Charles, Zachary, Xu, Zheng, Joshi, Gauri, McMahan, H. Brendan, Arcas, Blaise Aguera y, Al-Shedivat, Maruan, Andrew, Galen, Avestimehr, Salman, Daly, Katharine, Data, Deepesh, Diggavi, Suhas, Eichner, Hubert, Gadhikar, Advait, Garrett, Zachary, Girgis, Antonious M., Hanzely, Filip, Hard, Andrew, He, Chaoyang, Horvath, Samuel, Huo, Zhouyuan, Ingerman, Alex, Jaggi, Martin, Javidi, Tara, Kairouz, Peter, Kale, Satyen, Karimireddy, Sai Praneeth, Konecny, Jakub, Koyejo, Sanmi, Li, Tian, Liu, Luyang, Mohri, Mehryar, Qi, Hang, Reddi, Sashank J., Richtarik, Peter, Singhal, Karan, Smith, Virginia, Soltanolkotabi, Mahdi, Song, Weikang, Suresh, Ananda Theertha, Stich, Sebastian U., Talwalkar, Ameet, Wang, Hongyi, Woodworth, Blake, Wu, Shanshan, Yu, Felix X., Yuan, Honglin, Zaheer, Manzil, Zhang, Mi, Zhang, Tong, Zheng, Chunxiang, Zhu, Chen, and Zhu, Wennan
Abstract: Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and other constraints that are not primary considerations in other problem settings. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer real-world performance. The goal of this work is not to survey the current literature, but to inspire researchers and practitioners to design federated learning algorithms that can be used in various practical applications.
Published: 2021

7. Adaptive Learning of the Optimal Batch Size of SGD

Author: Alfarra, Motasem, Hanzely, Slavomir, Albasyoni, Alyazeed, Ghanem, Bernard, Richtarik, Peter, Alfarra, Motasem, Hanzely, Slavomir, Albasyoni, Alyazeed, Ghanem, Bernard, and Richtarik, Peter
Abstract: Recent advances in the theoretical understanding of SGD led to a formula for the optimal batch size minimizing the number of effective data passes, i.e., the number of iterations times the batch size. However, this formula is of no practical value as it depends on the knowledge of the variance of the stochastic gradients evaluated at the optimum. In this paper we design a practical SGD method capable of learning the optimal batch size adaptively throughout its iterations for strongly convex and smooth functions. Our method does this provably, and in our experiments with synthetic and real data robustly exhibits nearly optimal behaviour; that is, it works as if the optimal batch size was known a-priori. Further, we generalize our method to several new batch strategies not considered in the literature before, including a sampling suitable for distributed implementations., Comment: Accepted to the 12th Annual Workshop on Optimization for Machine Learning (OPT2020)
Published: 2020

8. Variance Reduced Coordinate Descent with Acceleration: New Method With a Surprising Application to Finite-Sum Problems

Author: Hanzely, Filip, Kovalev, Dmitry, Richtarik, Peter, Hanzely, Filip, Kovalev, Dmitry, and Richtarik, Peter
Abstract: We propose an accelerated version of stochastic variance reduced coordinate descent -- ASVRCD. As other variance reduced coordinate descent methods such as SEGA or SVRCD, our method can deal with problems that include a non-separable and non-smooth regularizer, while accessing a random block of partial derivatives in each iteration only. However, ASVRCD incorporates Nesterov's momentum, which offers favorable iteration complexity guarantees over both SEGA and SVRCD. As a by-product of our theory, we show that a variant of Allen-Zhu (2017) is a specific case of ASVRCD, recovering the optimal oracle complexity for the finite sum objective., Comment: 30 pages, 8 figures
Published: 2020

9. A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

Author: Kovalev, Dmitry, Koloskova, Anastasia, Jaggi, Martin, Richtarik, Peter, Stich, Sebastian U., Kovalev, Dmitry, Koloskova, Anastasia, Jaggi, Martin, Richtarik, Peter, and Stich, Sebastian U.
Abstract: Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. In many scenarios communication between devices is energy demanding and time consuming and forms the bottleneck of the entire system. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators to the communicated messages. By combining our scheme with a new variance reduction technique that progressively throughout the iterations reduces the adverse effect of the injected quantization noise, we obtain the first scheme that converges linearly on strongly convex decentralized problems while using compressed communication only. We prove that our method can solve the problems without any increase in the number of communications compared to the baseline which does not perform any communication compression while still allowing for a significant compression factor which depends on the conditioning of the problem and the topology of the network. Our key theoretical findings are supported by numerical experiments.
Published: 2020

10. Optimal Client Sampling for Federated Learning

Author: Chen, Wenlin, Horvath, Samuel, Richtarik, Peter, Chen, Wenlin, Horvath, Samuel, and Richtarik, Peter
Abstract: It is well understood that client-master communication can be a primary bottleneck in Federated Learning. In this work, we address this issue with a novel client subsampling scheme, where we restrict the number of clients allowed to communicate their updates back to the master node. In each communication round, all participating clients compute their updates, but only the ones with "important" updates communicate back to the master. We show that importance can be measured using only the norm of the update and give a formula for optimal client participation. This formula minimizes the distance between the full update, where all clients participate, and our limited update, where the number of participating clients is restricted. In addition, we provide a simple algorithm that approximates the optimal formula for client participation, which only requires secure aggregation and thus does not compromise client privacy. We show both theoretically and empirically that for Distributed SGD (DSGD) and Federated Averaging (FedAvg), the performance of our approach can be close to full participation and superior to the baseline where participating clients are sampled uniformly. Moreover, our approach is orthogonal to and compatible with existing methods for reducing communication overhead, such as local methods and communication compression methods., Comment: Published in Transactions on Machine Learning Research, code available: https://github.com/SamuelHorvath/FL-optimal-client-sampling
Published: 2020

11. Variance-Reduced Methods for Machine Learning

Author: Gower, Robert M., Schmidt, Mark, Bach, Francis, Richtarik, Peter, Gower, Robert M., Schmidt, Mark, Bach, Francis, and Richtarik, Peter
Abstract: Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago. The last 8 years have seen an exciting new development: variance reduction (VR) for stochastic optimization methods. These VR methods excel in settings where more than one pass through the training data is allowed, achieving a faster convergence than SGD in theory as well as practice. These speedups underline the surge of interest in VR methods and the fast-growing body of work on this topic. This review covers the key principles and main developments behind VR methods for optimization with finite data sets and is aimed at non-expert readers. We focus mainly on the convex setting, and leave pointers to readers interested in extensions for minimizing non-convex functions., Comment: 16 pages, 7 figures, 1 table
Published: 2020

12. Natural Compression for Distributed Deep Learning

Author: Horvath, Samuel, Ho, Chen-Yu, Horvath, Ludovit, Sahu, Atal Narayan, Canini, Marco, Richtarik, Peter, Horvath, Samuel, Ho, Chen-Yu, Horvath, Ludovit, Sahu, Atal Narayan, Canini, Marco, and Richtarik, Peter
Abstract: Modern deep learning models are often trained in parallel over a collection of distributed machines to reduce training time. In such settings, communication of model updates among machines becomes a significant performance bottleneck and various lossy update compression techniques have been proposed to alleviate this problem. In this work, we introduce a new, simple yet theoretically and practically effective compression technique: natural compression (NC). Our technique is applied individually to all entries of the to-be-compressed update vector and works by randomized rounding to the nearest (negative or positive) power of two, which can be computed in a "natural" way by ignoring the mantissa. We show that compared to no compression, NC increases the second moment of the compressed vector by not more than the tiny factor $\frac{9}{8}$, which means that the effect of NC on the convergence speed of popular training algorithms, such as distributed SGD, is negligible. However, the communications savings enabled by NC are substantial, leading to $3$-$4\times$ improvement in overall theoretical running time. For applications requiring more aggressive compression, we generalize NC to natural dithering, which we prove is exponentially better than the common random dithering technique. Our compression operators can be used on their own or in combination with existing operators for a more aggressive combined effect and offer new state-of-the-art both in theory and practice., Comment: Proceedings of 3${}^{\text{rd}}$ Annual Conference on Mathematical and Scientific Machine Learning (MSML 2022)
Published: 2019

13. SGD: General Analysis and Improved Rates

Author: Gower, Robert Mansel, Loizou, Nicolas, Qian, Xun, Sailanbayev, Alibek, Shulgin, Egor, Richtarik, Peter, Gower, Robert Mansel, Loizou, Nicolas, Qian, Xun, Sailanbayev, Alibek, Shulgin, Egor, and Richtarik, Peter
Abstract: We propose a general yet simple theorem describing the convergence of SGD under the arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of variants of SGD, each of which is associated with a specific probability law governing the data selection rule used to form mini-batches. This is the first time such an analysis is performed, and most of our variants of SGD were never explicitly considered in the literature before. Our analysis relies on the recently introduced notion of expected smoothness and does not rely on a uniform bound on the variance of the stochastic gradients. By specializing our theorem to different mini-batching strategies, such as sampling with replacement and independent sampling, we derive exact expressions for the stepsize as a function of the mini-batch size. With this we can also determine the mini-batch size that optimizes the total complexity, and show explicitly that as the variance of the stochastic gradient evaluated at the minimum grows, so does the optimal mini-batch size. For zero variance, the optimal mini-batch size is one. Moreover, we prove insightful stepsize-switching rules which describe when one should switch from a constant to a decreasing stepsize regime., Comment: 23 pages, 6 figures
Published: 2019

14. Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop

Author: Kovalev, Dmitry, Horvath, Samuel, Richtarik, Peter, Kovalev, Dmitry, Horvath, Samuel, and Richtarik, Peter
Abstract: The stochastic variance-reduced gradient method (SVRG) and its accelerated variant (Katyusha) have attracted enormous attention in the machine learning community in the last few years due to their superior theoretical properties and empirical behaviour on training supervised machine learning models via the empirical risk minimization paradigm. A key structural element in both of these methods is the inclusion of an outer loop at the beginning of which a full pass over the training data is made in order to compute the exact gradient, which is then used to construct a variance-reduced estimator of the gradient. In this work we design {\em loopless variants} of both of these methods. In particular, we remove the outer loop and replace its function by a coin flip performed in each iteration designed to trigger, with a small probability, the computation of the gradient. We prove that the new methods enjoy the same superior theoretical convergence properties as the original methods. However, we demonstrate through numerical experiments that our methods have substantially superior practical behavior., Comment: 14 pages, 2 algorithms, 9 lemmas, 2 theorems, 4 figures
Published: 2019

15. SEGA: Variance Reduction via Gradient Sketching

Author: Hanzely, Filip, Mishchenko, Konstantin, Richtarik, Peter, Hanzely, Filip, Mishchenko, Konstantin, and Richtarik, Peter
Abstract: We propose a randomized first order optimization method--SEGA (SkEtched GrAdient method)-- which progressively throughout its iterations builds a variance-reduced estimate of the gradient from random linear measurements (sketches) of the gradient obtained from an oracle. In each iteration, SEGA updates the current estimate of the gradient through a sketch-and-project operation using the information provided by the latest sketch, and this is subsequently used to compute an unbiased estimate of the true gradient through a random relaxation procedure. This unbiased estimate is then used to perform a gradient step. Unlike standard subspace descent methods, such as coordinate descent, SEGA can be used for optimization problems with a non-separable proximal term. We provide a general convergence analysis and prove linear convergence for strongly convex objectives. In the special case of coordinate sketches, SEGA can be enhanced with various techniques such as importance sampling, minibatching and acceleration, and its rate is up to a small constant factor identical to the best-known rate of coordinate descent., Comment: Accepted to the NIPS conference
Published: 2018

16. Weighted Low-Rank Approximation of Matrices and Background Modeling

Author: Dutta, Aritra, Li, Xin, Richtarik, Peter, Dutta, Aritra, Li, Xin, and Richtarik, Peter
Abstract: We primarily study a special a weighted low-rank approximation of matrices and then apply it to solve the background modeling problem. We propose two algorithms for this purpose: one operates in the batch mode on the entire data and the other one operates in the batch-incremental mode on the data and naturally captures more background variations and computationally more effective. Moreover, we propose a robust technique that learns the background frame indices from the data and does not require any training frames. We demonstrate through extensive experiments that by inserting a simple weight in the Frobenius norm, it can be made robust to the outliers similar to the $\ell_1$ norm. Our methods match or outperform several state-of-the-art online and batch background modeling methods in virtually all quantitative and qualitative measures., Comment: arXiv admin note: text overlap with arXiv:1707.00281
Published: 2018

17. Online and Batch Supervised Background Estimation via L1 Regression

Author: Dutta, Aritra, Richtarik, Peter, Dutta, Aritra, and Richtarik, Peter
Abstract: We propose a surprisingly simple model for supervised video background estimation. Our model is based on $\ell_1$ regression. As existing methods for $\ell_1$ regression do not scale to high-resolution videos, we propose several simple and scalable methods for solving the problem, including iteratively reweighted least squares, a homotopy method, and stochastic gradient descent. We show through extensive experiments that our model and methods match or outperform the state-of-the-art online and batch methods in virtually all quantitative and qualitative measures.
Published: 2017

18. Stochastic Dual Ascent for Solving Linear Systems

Author: Gower, Robert Mansel, Richtarik, Peter, Gower, Robert Mansel, and Richtarik, Peter
Abstract: We develop a new randomized iterative algorithm---stochastic dual ascent (SDA)---for finding the projection of a given vector onto the solution space of a linear system. The method is dual in nature: with the dual being a non-strongly concave quadratic maximization problem without constraints. In each iteration of SDA, a dual variable is updated by a carefully chosen point in a subspace spanned by the columns of a random matrix drawn independently from a fixed distribution. The distribution plays the role of a parameter of the method. Our complexity results hold for a wide family of distributions of random matrices, which opens the possibility to fine-tune the stochasticity of the method to particular applications. We prove that primal iterates associated with the dual process converge to the projection exponentially fast in expectation, and give a formula and an insightful lower bound for the convergence rate. We also prove that the same rate applies to dual function values, primal function values and the duality gap. Unlike traditional iterative methods, SDA converges under no additional assumptions on the system (e.g., rank, diagonal dominance) beyond consistency. In fact, our lower bound improves as the rank of the system matrix drops. Many existing randomized methods for linear systems arise as special cases of SDA, including randomized Kaczmarz, randomized Newton, randomized coordinate descent, Gaussian descent, and their variants. In special cases where our method specializes to a known algorithm, we either recover the best known rates, or improve upon them. Finally, we show that the framework can be applied to the distributed average consensus problem to obtain an array of new algorithms. The randomized gossip algorithm arises as a special case., Comment: This is a slightly refreshed version of the paper originally submitted on Dec 21, 2015. We have added a numerical experiment involving randomized Kaczmarz for rank-deficient systems, added a few relevant references, and corrected a few typos. Stats: 29 pages, 2 algorithms, 1 figure
Published: 2015

19. Matrix Completion under Interval Uncertainty

Author: Marecek, Jakub, Richtarik, Peter, Takac, Martin, Marecek, Jakub, Richtarik, Peter, and Takac, Martin
Abstract: Matrix completion under interval uncertainty can be cast as matrix completion with element-wise box constraints. We present an efficient alternating-direction parallel coordinate-descent method for the problem. We show that the method outperforms any other known method on a benchmark in image in-painting in terms of signal-to-noise ratio, and that it provides high-quality solutions for an instance of collaborative filtering with 100,198,805 recommendations within 5 minutes.
Published: 2014
Full Text: View/download PDF

20. Separable Approximations and Decomposition Methods for the Augmented Lagrangian

Author: Tappenden, Rachael, Richtarik, Peter, Buke, Burak, Tappenden, Rachael, Richtarik, Peter, and Buke, Burak
Abstract: In this paper we study decomposition methods based on separable approximations for minimizing the augmented Lagrangian. In particular, we study and compare the Diagonal Quadratic Approximation Method (DQAM) of Mulvey and Ruszczy\'{n}ski and the Parallel Coordinate Descent Method (PCDM) of Richt\'arik and Tak\'a\v{c}. We show that the two methods are equivalent for feasibility problems up to the selection of a single step-size parameter. Furthermore, we prove an improved complexity bound for PCDM under strong convexity, and show that this bound is at least $8(L'/\bar{L})(\omega-1)^2$ times better than the best known bound for DQAM, where $\omega$ is the degree of partial separability and $L'$ and $\bar{L}$ are the maximum and average of the block Lipschitz constants of the gradient of the quadratic penalty appearing in the augmented Lagrangian., Comment: 28 pages, 6 algorithms, 2 figures
Published: 2013

21. Generalized power method for sparse principal component analysis

Author: Université de Liège - Department of Electrical Engineering and Computer Science, UCL - EUEN/CORE - Center for operations research and econometrics, UCL - SSH/IMAQ - Institut multidisciplinaire pour la modélisation et l'analyse quantitative, UCL - SST/ICTM/INMA - Pôle en ingénierie mathématique, Journée, Michel, Nesterov, Yurii, Richtarik, Peter, Sepulchre, Rodolphe, Université de Liège - Department of Electrical Engineering and Computer Science, UCL - EUEN/CORE - Center for operations research and econometrics, UCL - SSH/IMAQ - Institut multidisciplinaire pour la modélisation et l'analyse quantitative, UCL - SST/ICTM/INMA - Pôle en ingénierie mathématique, Journée, Michel, Nesterov, Yurii, Richtarik, Peter, and Sepulchre, Rodolphe
Abstract: In this paper we develop a new approach to sparse principal component analysis (sparse PCA). We propose two single-unit and two block optimization formulations of the sparse PCA problem, aimed at extracting a single sparse dominant principal component of a data matrix, or more components at once, respectively. While the initial formulations involve nonconvex functions, and are therefore computationally intractable, we rewrite them into the form of an optimization program involving maximization of a convex function on a compact set. The dimension of the search space is decreased enormously if the data matrix has many more columns (variables) than rows. We then propose and analyze a simple gradient method suited for the task. It appears that our algorithm has best convergence properties in the case when either the objective function or the feasible set are strongly convex, which is the case with our single-unit formulations and can be enforced in the block case. Finally, we demonstrate numerically on a set of random and gene expression test problems that our approach outperforms existing algorithms both in quality of the obtained solution and in computational speed.
Published: 2010

22. Generalized power method for sparse principal component analysis

Author: UCL - EUEN/CORE - Center for operations research and econometrics, Journée, Michel, Nesterov, Yurii, Richtarik, Peter, Sepulchre, Rodolphe, UCL - EUEN/CORE - Center for operations research and econometrics, Journée, Michel, Nesterov, Yurii, Richtarik, Peter, and Sepulchre, Rodolphe
Abstract: In this paper we develop a new approach to sparse principal component analysis (sparse PCA). We propose two single-unit and two block optimization formulations of the sparse PCA problem, aimed at extracting a single sparse dominant principal component of a data matrix, or more components at once, respectively. While the initial formulations involve nonconvex functions, and are therefore computationally intractable, we rewrite them into the form of an optimization program involving maximization of a convex function on a compact set. The dimension of the search space is decreased enormously if the data matrix has many more columns (variables) than rows. We then propose and analyze a simple gradient method suited for the task. It appears that our algorithm has best convergence properties in the case when either the objective function of the feasible set are strongly convex, which is the case with our single-unit formulations and can be enforced in the block case. Finally, we demonstrate numerically on a set of random and gene expression test problems that our approach outperforms existing algorithms both in quality of the obtained solution and in computational speed.
Published: 2008

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

22 results on '"Richtarik, Peter"'

1. MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

2. PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

3. Federated Learning is Better with Non-Homomorphic Encryption

4. Adaptive Compression for Communication-Efficient Distributed Training

5. Convergence of Stein Variational Gradient Descent under a Weaker Smoothness Condition

6. A Field Guide to Federated Optimization

7. Adaptive Learning of the Optimal Batch Size of SGD

8. Variance Reduced Coordinate Descent with Acceleration: New Method With a Surprising Application to Finite-Sum Problems

9. A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

10. Optimal Client Sampling for Federated Learning

11. Variance-Reduced Methods for Machine Learning

12. Natural Compression for Distributed Deep Learning

13. SGD: General Analysis and Improved Rates

14. Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop

15. SEGA: Variance Reduction via Gradient Sketching

16. Weighted Low-Rank Approximation of Matrices and Background Modeling

17. Online and Batch Supervised Background Estimation via L1 Regression

18. Stochastic Dual Ascent for Solving Linear Systems

19. Matrix Completion under Interval Uncertainty

20. Separable Approximations and Decomposition Methods for the Augmented Lagrangian

21. Generalized power method for sparse principal component analysis

22. Generalized power method for sparse principal component analysis

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

22 results on '"Richtarik, Peter"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources