118 results on '"So, Anthony Man-Cho"'
Search Results
2. LogSpecT: Feasible Graph Learning Model from Stationary Signals with Recovery Guarantees
- Author
-
Liu, Shangyuan, Zhu, Linglingzhi, and So, Anthony Man-Cho
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Optimization and Control (math.OC) ,FOS: Electrical engineering, electronic engineering, information engineering ,FOS: Mathematics ,Machine Learning (stat.ML) ,Electrical Engineering and Systems Science - Signal Processing ,Mathematics - Optimization and Control ,Machine Learning (cs.LG) - Abstract
Graph learning from signals is a core task in Graph Signal Processing (GSP). One of the most commonly used models to learn graphs from stationary signals is SpecT. However, its practical formulation rSpecT is known to be sensitive to hyperparameter selection and, even worse, to suffer from infeasibility. In this paper, we give the first condition that guarantees the infeasibility of rSpecT and design a novel model (LogSpecT) and its practical formulation (rLogSpecT) to overcome this issue. Contrary to rSpecT, the novel practical model rLogSpecT is always feasible. Furthermore, we provide recovery guarantees of rLogSpecT, which are derived from modern optimization tools related to epi-convergence. These tools could be of independent interest and significant for various learning problems. To demonstrate the advantages of rLogSpecT in practice, a highly efficient algorithm based on the linearized alternating direction method of multipliers (L-ADMM) is proposed. The subproblems of L-ADMM admit closed-form solutions and the convergence is guaranteed. Extensive numerical results on both synthetic and real networks corroborate the stability and superiority of our proposed methods, underscoring their potential for various graph learning applications.
- Published
- 2023
- Full Text
- View/download PDF
3. A Convergent Single-Loop Algorithm for Relaxation of Gromov-Wasserstein in Graph Data
- Author
-
Li, Jiajin, Tang, Jianheng, Kong, Lemin, Liu, Huikang, Li, Jia, So, Anthony Man-Cho, and Blanchet, Jose
- Subjects
Computational Geometry (cs.CG) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Optimization and Control (math.OC) ,FOS: Mathematics ,Computer Science - Computational Geometry ,Mathematics - Optimization and Control ,Machine Learning (cs.LG) - Abstract
In this work, we present the Bregman Alternating Projected Gradient (BAPG) method, a single-loop algorithm that offers an approximate solution to the Gromov-Wasserstein (GW) distance. We introduce a novel relaxation technique that balances accuracy and computational efficiency, albeit with some compromises in the feasibility of the coupling map. Our analysis is based on the observation that the GW problem satisfies the Luo-Tseng error bound condition, which relates to estimating the distance of a point to the critical point set of the GW problem based on the optimality residual. This observation allows us to provide an approximation bound for the distance between the fixed-point set of BAPG and the critical point set of GW. Moreover, under a mild technical assumption, we can show that BAPG converges to its fixed point set. The effectiveness of BAPG has been validated through comprehensive numerical experiments in graph alignment and partition tasks, where it outperforms existing methods in terms of both solution quality and wall-clock time., Comment: Accepted by ICLR 2023
- Published
- 2023
- Full Text
- View/download PDF
4. Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder
- Author
-
Fu, Zihao, Lam, Wai, Yu, Qian, So, Anthony Man-Cho, Hu, Shengding, Liu, Zhiyuan, and Collier, Nigel
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computation and Language (cs.CL) ,Machine Learning (cs.LG) - Abstract
The sequence-to-sequence (seq2seq) task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by the Encoder-Decoder framework which requires an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task. Despite the significant advancements in applying language models to the seq2seq task, there is still a lack of thorough analysis on the effectiveness of the decoder-only language model architecture. This paper aims to address this gap by conducting a detailed comparison between the encoder-decoder architecture and the decoder-only language model framework through the analysis of a regularized encoder-decoder structure. This structure is designed to replicate all behaviors in the classical decoder-only language model but has an encoder and a decoder making it easier to be compared with the classical encoder-decoder structure. Based on the analysis, we unveil the attention degeneration problem in the language model, namely, as the generation step number grows, less and less attention is focused on the source sequence. To give a quantitative understanding of this problem, we conduct a theoretical sensitivity analysis of the attention output with respect to the source input. Grounded on our analysis, we propose a novel partial attention language model to solve the attention degeneration problem. Experimental results on machine translation, summarization, and data-to-text generation tasks support our analysis and demonstrate the effectiveness of our proposed model.
- Published
- 2023
- Full Text
- View/download PDF
5. ReSync: Riemannian Subgradient-based Robust Rotation Synchronization
- Author
-
Liu, Huikang, Li, Xiao, and So, Anthony Man-Cho
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Optimization and Control (math.OC) ,FOS: Mathematics ,Mathematics - Optimization and Control ,Machine Learning (cs.LG) - Abstract
This work presents ReSync, a Riemannian subgradient-based algorithm for solving the robust rotation synchronization problem, which arises in various engineering applications. ReSync solves a least-unsquared minimization formulation over the rotation group, which is nonsmooth and nonconvex, and aims at recovering the underlying rotations directly. We provide strong theoretical guarantees for ReSync under the random corruption setting. Specifically, we first show that the initialization procedure of ReSync yields a proper initial point that lies in a local region around the ground-truth rotations. We next establish the weak sharpness property of the aforementioned formulation and then utilize this property to derive the local linear convergence of ReSync to the ground-truth rotations. By combining these guarantees, we conclude that ReSync converges linearly to the ground-truth rotations under appropriate conditions. Experiment results demonstrate the effectiveness of ReSync., Comment: 24 pages, 3 figures
- Published
- 2023
- Full Text
- View/download PDF
6. Rotation Group Synchronization via Quotient Manifold
- Author
-
Zhu, Linglingzhi, Li, Chong, and So, Anthony Man-Cho
- Subjects
Signal Processing (eess.SP) ,Optimization and Control (math.OC) ,FOS: Mathematics ,FOS: Electrical engineering, electronic engineering, information engineering ,Electrical Engineering and Systems Science - Signal Processing ,Mathematics - Optimization and Control - Abstract
Rotation group $\mathcal{SO}(d)$ synchronization is an important inverse problem and has attracted intense attention from numerous application fields such as graph realization, computer vision, and robotics. In this paper, we focus on the least-squares estimator of rotation group synchronization with general additive noise models, which is a nonconvex optimization problem with manifold constraints. Unlike the phase/orthogonal group synchronization, there are limited provable approaches for solving rotation group synchronization. First, we derive improved estimation results of the least-squares/spectral estimator, illustrating the tightness and validating the existing relaxation methods of solving rotation group synchronization through the optimum of relaxed orthogonal group version under near-optimal noise level for exact recovery. Moreover, departing from the standard approach of utilizing the geometry of the ambient Euclidean space, we adopt an intrinsic Riemannian approach to study orthogonal/rotation group synchronization. Benefiting from a quotient geometric view, we prove the positive definite condition of quotient Riemannian Hessian around the optimum of orthogonal group synchronization problem, and consequently the Riemannian local error bound property is established to analyze the convergence rate properties of various Riemannian algorithms. As a simple and feasible method, the sequential convergence guarantee of the (quotient) Riemannian gradient method for solving orthogonal/rotation group synchronization problem is studied, and we derive its global linear convergence rate to the optimum with the spectral initialization. All results are deterministic without any probabilistic model.
- Published
- 2023
- Full Text
- View/download PDF
7. Adaptive coordinate sampling for stochastic primal–dual optimization
- Author
-
Huikang Liu, Xiaolu Wang, and Anthony Man-Cho So
- Subjects
Mathematical optimization ,Computer science ,Management of Technology and Innovation ,Strategy and Management ,Sampling (statistics) ,Management Science and Operations Research ,Business and International Management ,Computer Science Applications ,Primal dual - Published
- 2021
8. A Theoretical Analysis of the Repetition Problem in Text Generation
- Author
-
Fu, Zihao, Lam, Wai, So, Anthony Man-Cho, and Shi, Bei
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,General Medicine ,Computation and Language (cs.CL) - Abstract
Text generation tasks, including translation, summarization, language models, and etc. see rapid growth during recent years. Despite the remarkable achievements, the repetition problem has been observed in nearly all text generation models undermining the generation performance extensively. To solve the repetition problem, many methods have been proposed, but there is no existing theoretical analysis to show why this problem happens and how it is resolved. In this paper, we propose a new framework for theoretical analysis for the repetition problem. We first define the Average Repetition Probability (ARP) to characterize the repetition problem quantitatively. Then, we conduct an extensive analysis of the Markov generation model and derive several upper bounds of the average repetition probability with intuitive understanding. We show that most of the existing methods are essentially minimizing the upper bounds explicitly or implicitly. Grounded on our theory, we show that the repetition problem is, unfortunately, caused by the traits of our language itself. One major reason is attributed to the fact that there exist too many words predicting the same word as the subsequent word with high probability. Consequently, it is easy to go back to that word and form repetitions and we dub it as the high inflow problem. Furthermore, we derive a concentration bound of the average repetition probability for a general generation model. Finally, based on the theoretical upper bounds, we propose a novel rebalanced encoding approach to alleviate the high inflow problem. The experimental results show that our theoretical framework is applicable in general generation models and our proposed rebalanced encoding approach alleviates the repetition problem significantly. The source code of this paper can be obtained from https://github.com/fuzihaofzh/repetition-problem-nlg., Comment: AAAI 21 Paper with Appendix
- Published
- 2021
9. Voting-Based Multiagent Reinforcement Learning for Intelligent IoT
- Author
-
Mengdi Wang, Anthony Man-Cho So, Zengde Deng, Shuguang Cui, Yue Xu, and Wenjun Xu
- Subjects
0209 industrial biotechnology ,Optimization problem ,Linear programming ,Computer Networks and Communications ,Process (engineering) ,business.industry ,Computer science ,media_common.quotation_subject ,020206 networking & telecommunications ,02 engineering and technology ,Computer Science Applications ,Computer Science::Multiagent Systems ,020901 industrial engineering & automation ,Rate of convergence ,Hardware and Architecture ,Voting ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,Reinforcement learning ,Artificial intelligence ,business ,Information Systems ,media_common - Abstract
The recent success of single-agent reinforcement learning (RL) in Internet of Things (IoT) systems motivates the study of multiagent RL (MARL), which is more challenging but more useful in large-scale IoT. In this article, we consider a voting-based MARL problem, in which the agents vote to make group decisions and the goal is to maximize the globally averaged returns. To this end, we formulate the MARL problem based on the linear programming form of the policy optimization problem and propose a primal–dual algorithm to obtain the optimal solution. We also propose a voting mechanism through which the distributed learning achieves the same sublinear convergence rate as centralized learning. In other words, the distributed decision making does not slow down the process of achieving global consensus on optimality. Finally, we verify the convergence of our proposed algorithm with numerical simulations and conduct case studies in practical multiagent IoT systems.
- Published
- 2021
10. Manifold Proximal Point Algorithms for Dual Principal Component Pursuit and Orthogonal Dictionary Learning
- Author
-
Zengde Deng, Anthony Man-Cho So, Shixiang Chen, and Shiqian Ma
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Sublinear function ,Computer science ,Heuristic (computer science) ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Machine Learning (stat.ML) ,010103 numerical & computational mathematics ,02 engineering and technology ,01 natural sciences ,Machine Learning (cs.LG) ,law.invention ,Stiefel manifold ,Principal component pursuit ,Statistics - Machine Learning ,law ,Convergence (routing) ,FOS: Mathematics ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,Electrical Engineering and Systems Science - Signal Processing ,0101 mathematics ,Electrical and Electronic Engineering ,Mathematics - Optimization and Control ,Augmented Lagrangian method ,020206 networking & telecommunications ,Manifold ,Dual (category theory) ,Proximal point ,Rate of convergence ,Optimization and Control (math.OC) ,Norm (mathematics) ,Signal Processing ,Manifold (fluid mechanics) ,Dictionary learning ,Algorithm ,Subspace topology - Abstract
We consider the problem of maximizing the $\ell_1$ norm of a linear map over the sphere, which arises in various machine learning applications such as orthogonal dictionary learning (ODL) and robust subspace recovery (RSR). The problem is numerically challenging due to its nonsmooth objective and nonconvex constraint, and its algorithmic aspects have not been well explored. In this paper, we show how the manifold structure of the sphere can be exploited to design fast algorithms for tackling this problem. Specifically, our contribution is threefold. First, we present a manifold proximal point algorithm (ManPPA) for the problem and show that it converges at a sublinear rate. Furthermore, we show that ManPPA can achieve a quadratic convergence rate when applied to the ODL and RSR problems. Second, we propose a stochastic variant of ManPPA called StManPPA, which is well suited for large-scale computation, and establish its sublinear convergence rate. Both ManPPA and StManPPA have provably faster convergence rates than existing subgradient-type methods. Third, using ManPPA as a building block, we propose a new approach to solving a matrix analog of the problem, in which the sphere is replaced by the Stiefel manifold. The results from our extensive numerical experiments on the ODL and RSR problems demonstrate the efficiency and efficacy of our proposed methods., Comment: Accepted in IEEE Transactions on Signal Processing
- Published
- 2021
11. A Newton Tracking Algorithm With Exact Linear Convergence for Decentralized Consensus Optimization
- Author
-
Qing Ling, Anthony Man-Cho So, and Jiaojiao Zhang
- Subjects
0209 industrial biotechnology ,Linear programming ,Computer Networks and Communications ,Computation ,Local variable ,020206 networking & telecommunications ,02 engineering and technology ,020901 industrial engineering & automation ,Rate of convergence ,Signal Processing ,Convergence (routing) ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,Convex function ,Algorithm ,Kappa ,Information Systems ,Mathematics - Abstract
This paper considers the problem of decentralized consensus optimization over a network, where each node holds a strongly convex and twice-differentiable local objective function. Our goal is to minimize the sum of the local objective functions and find the exact optimal solution using only local computation and neighboring communication. We propose a novel Newton tracking algorithm, which updates the local variable in each node along a local Newton direction modified with neighboring and historical information. We investigate the connections between the proposed Newton tracking algorithm and several existing methods, including gradient tracking and primal-dual methods. We prove that the proposed algorithm converges to the exact optimal solution at a linear rate. Furthermore, when the iterate is close to the optimal solution, we show that the proposed algorithm requires $O(\max \lbrace \kappa _f \sqrt{\kappa _g} + \kappa _f^2, \frac{\kappa _g^{3/2}}{\kappa _f} + \kappa _f\sqrt{\kappa _g} \rbrace \log {\frac{1}{\Delta }})$ iterations to find a $\Delta$ -optimal solution, where $\kappa _f$ and $\kappa _g$ are condition numbers of the objective function and the graph, respectively. Our numerical results demonstrate the efficacy of Newton tracking and validate the theoretical findings.
- Published
- 2021
12. A Penalty Alternating Direction Method of Multipliers for Convex Composite Optimization Over Decentralized Networks
- Author
-
Qing Ling, Huikang Liu, Anthony Man-Cho So, and Jiaojiao Zhang
- Subjects
Mathematical optimization ,Computer science ,Composite optimization ,Signal Processing ,Convergence (routing) ,Process (computing) ,Regular polygon ,Function (mathematics) ,Electrical and Electronic Engineering ,Decentralized network ,Gradient descent ,Convex function - Abstract
Consider the problem of minimizing a sum of convex composite functions over a decentralized network, where each agent in the network holds a private function consisting of a smooth part and a nonsmooth part, and it can only exchange information with its neighbors during the optimization process. One approach to tackling this problem is to study its penalized approximation. Although such an approximation becomes more accurate as the penalty parameter becomes smaller, it is well known that the penalized objective will also become more ill-conditioned, thereby causing the popular proximal gradient descent method to slow down substantially. To break this accuracy-speed tradeoff, we propose to solve the penalized approximation with the alternating direction method of multipliers (ADMM). We also exploit the composite structure of the private functions by linearizing the smooth parts and handling the nonsmooth parts with proximal operators, which allows us to further reduce the computational costs. The proposed penalty ADMM (abbreviated as PAD) is proven to be sublinearly convergent when the private functions are convex, and linearly convergent when in addition the smooth parts are strongly convex. We present numerical results to corroborate the theoretical analyses and to further demonstrate the advantages of PAD over existing state-of-the-art algorithms such as DL-ADMM, PG-EXTRA, and NIDS.
- Published
- 2021
13. On the Effectiveness of Parameter-Efficient Fine-Tuning
- Author
-
Fu, Zihao, Yang, Haoran, So, Anthony Man-Cho, Lam, Wai, Bing, Lidong, and Collier, Nigel
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computation and Language (cs.CL) ,Machine Learning (cs.LG) - Abstract
Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP tasks. However, fine-tuning the whole model is parameter inefficient as it always yields an entirely new model for each task. Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks. These methods achieve surprisingly good performance and are shown to be more stable than their corresponding fully fine-tuned counterparts. However, such kind of methods is still not well understood. Some natural questions arise: How does the parameter sparsity lead to promising performance? Why is the model more stable than the fully fine-tuned models? How to choose the tunable parameters? In this paper, we first categorize the existing methods into random approaches, rule-based approaches, and projection-based approaches based on how they choose which parameters to tune. Then, we show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them. We indicate that the sparsity is actually imposing a regularization on the original model by controlling the upper bound of the stability. Such stability leads to better generalization capability which has been empirically observed in a lot of recent research works. Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters. To better choose the tunable parameters, we propose a novel Second-order Approximation Method (SAM) which approximates the original problem with an analytically solvable optimization function. The tunable parameters are determined by directly optimizing the approximation function. The experimental results show that our proposed SAM model outperforms many strong baseline models and it also verifies our theoretical analysis.
- Published
- 2022
- Full Text
- View/download PDF
14. No Dimension-Free Deterministic Algorithm Computes Approximate Stationarities of Lipschitzians
- Author
-
Tian, Lai and So, Anthony Man-Cho
- Subjects
Optimization and Control (math.OC) ,FOS: Mathematics ,Mathematics - Optimization and Control - Abstract
We consider the computation of an approximately stationary point for a Lipschitz and semialgebraic function $f$ with a local oracle. If $f$ is smooth, simple deterministic methods have dimension-free finite oracle complexities. For the general Lipschitz setting, only recently, Zhang et al. [47] introduced a randomized algorithm that computes Goldstein's approximate stationarity [25] to arbitrary precision with a dimension-free polynomial oracle complexity. In this paper, we show that no deterministic algorithm can do the same. Even without the dimension-free requirement, we show that any finite time guaranteed deterministic method cannot be general zero-respecting, which rules out most of the oracle-based methods in smooth optimization and any trivial derandomization of Zhang et al. [47]. Our results reveal a fundamental hurdle of nonconvex nonsmooth problems in the modern large-scale setting and their infinite-dimensional extension., Comment: 17 pages, submitted to SODA 2023
- Published
- 2022
- Full Text
- View/download PDF
15. Riemannian Natural Gradient Methods
- Author
-
Hu, Jiang, Ao, Ruicheng, So, Anthony Man-Cho, Yang, Minghan, and Wen, Zaiwen
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Optimization and Control (math.OC) ,FOS: Mathematics ,Mathematics - Optimization and Control ,Machine Learning (cs.LG) - Abstract
This paper studies large-scale optimization problems on Riemannian manifolds whose objective function is a finite sum of negative log-probability losses. Such problems arise in various machine learning and signal processing applications. By introducing the notion of Fisher information matrix in the manifold setting, we propose a novel Riemannian natural gradient method, which can be viewed as a natural extension of the natural gradient method from the Euclidean setting to the manifold setting. We establish the almost-sure global convergence of our proposed method under standard assumptions. Moreover, we show that if the loss function satisfies certain convexity and smoothness conditions and the input-output map satisfies a Riemannian Jacobian stability condition, then our proposed method enjoys a local linear -- or, under the Lipschitz continuity of the Riemannian Jacobian of the input-output map, even quadratic -- rate of convergence. We then prove that the Riemannian Jacobian stability condition will be satisfied by a two-layer fully connected neural network with batch normalization with high probability, provided that the width of the network is sufficiently large. This demonstrates the practical relevance of our convergence rate result. Numerical experiments on applications arising from machine learning demonstrate the advantages of the proposed method over state-of-the-art ones.
- Published
- 2022
- Full Text
- View/download PDF
16. A Communication-Efficient Decentralized Newton's Method with Provably Faster Convergence
- Author
-
Liu, Huikang, Zhang, Jiaojiao, So, Anthony Man-Cho, and Ling, Qing
- Subjects
Optimization and Control (math.OC) ,FOS: Mathematics ,Mathematics - Optimization and Control - Abstract
In this paper, we consider a strongly convex finite-sum minimization problem over a decentralized network and propose a communication-efficient decentralized Newton's method for solving it. We first apply dynamic average consensus (DAC) so that each node is able to use a local gradient approximation and a local Hessian approximation to track the global gradient and Hessian, respectively. Second, since exchanging Hessian approximations is far from communication-efficient, we require the nodes to exchange the compressed ones instead and then apply an error compensation mechanism to correct for the compression noise. Third, we introduce multi-step consensus for exchanging local variables and local gradient approximations to balance between computation and communication. To avoid each node transmitting the entire local Hessian approximation, we design a compression procedure with error compensation to estimate the global Hessian in a communication-efficient way. With novel analysis, we establish the globally linear (resp., asymptotically super-linear) convergence rate of the proposed method when m is constant (resp., tends to infinity), where m is the number of consensus inner steps. To the best of our knowledge, this is the first super-linear convergence result for a communication-efficient decentralized Newton's method. Moreover, the rate we establish is provably faster than those of first-order methods. Our numerical results on various applications corroborate the theoretical findings.
- Published
- 2022
- Full Text
- View/download PDF
17. Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning: Part II
- Author
-
Zhang, Jiaojiao, Liu, Huikang, So, Anthony Man-Cho, and Ling, Qing
- Subjects
Optimization and Control (math.OC) ,MathematicsofComputing_NUMERICALANALYSIS ,FOS: Mathematics ,Mathematics - Optimization and Control - Abstract
In Part I of this work, we have proposed a general framework of decentralized stochastic quasi-Newton methods, which converge linearly to the optimal solution under the assumption that the local Hessian inverse approximations have bounded positive eigenvalues. In Part II, we specify two fully decentralized stochastic quasi-Newton methods, damped regularized limited-memory DFP (Davidon-Fletcher-Powell) and damped limited-memory BFGS (Broyden-Fletcher-Goldfarb-Shanno), to locally construct such Hessian inverse approximations without extra sampling or communication. Both of the methods use a fixed moving window of $M$ past local gradient approximations and local decision variables to adaptively construct positive definite Hessian inverse approximations with bounded eigenvalues, satisfying the assumption in Part I for the linear convergence. For the proposed damped regularized limited-memory DFP, a regularization term is added to improve the performance. For the proposed damped limited-memory BFGS, a two-loop recursion is applied, leading to low storage and computation complexity. Numerical experiments demonstrate that the proposed quasi-Newton methods are much faster than the existing decentralized stochastic first-order algorithms.
- Published
- 2022
- Full Text
- View/download PDF
18. Fast and Provably Convergent Algorithms for Gromov-Wasserstein in Graph Data
- Author
-
Li, Jiajin, Tang, Jianheng, Kong, Lemin, Liu, Huikang, Li, Jia, So, Anthony Man-Cho, and Blanchet, Jose
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Machine Learning (cs.LG) - Abstract
In this paper, we study the design and analysis of a class of efficient algorithms for computing the Gromov-Wasserstein (GW) distance tailored to large-scale graph learning tasks. Armed with the Luo-Tseng error bound condition~\citep{luo1992error}, two proposed algorithms, called Bregman Alternating Projected Gradient (BAPG) and hybrid Bregman Proximal Gradient (hBPG) enjoy the convergence guarantees. Upon task-specific properties, our analysis further provides novel theoretical insights to guide how to select the best-fit method. As a result, we are able to provide comprehensive experiments to validate the effectiveness of our methods on a host of tasks, including graph alignment, graph partition, and shape matching. In terms of both wall-clock time and modeling performance, the proposed methods achieve state-of-the-art results.
- Published
- 2022
- Full Text
- View/download PDF
19. Understanding Notions of Stationarity in Nonsmooth Optimization: A Guided Tour of Various Constructions of Subdifferential for Nonsmooth Functions
- Author
-
Jiajin Li, Anthony Man-Cho So, and Wing-Kin Ma
- Subjects
Mathematical optimization ,Optimization problem ,Iterative method ,Computer science ,Applied Mathematics ,020206 networking & telecommunications ,02 engineering and technology ,Subderivative ,Stationary point ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Relevance (information retrieval) ,Minification ,Electrical and Electronic Engineering ,Convex function ,Simple (philosophy) - Abstract
Many contemporary applications in signal processing and machine learning give rise to structured nonconvex nonsmooth optimization problems that can often be tackled by simple iterative methods quite effectively. One of the keys to understanding such a phenomenon-and, in fact, a very difficult conundrum even for experts-lies in the study of "stationary points" of the problem in question. Unlike smooth optimization, for which the definition of a stationary point is rather standard, there are myriad definitions of stationarity in nonsmooth optimization. In this article, we provide an introduction to different stationarity concepts for several important classes of nonconvex nonsmooth functions, discuss the geometric interpretations of these concepts, and further clarify their relationships. We then demonstrate the relevance of these constructions in some representative applications and indicate how they could affect the performance of iterative methods for addressing these applications.
- Published
- 2020
20. A Provably Convergent Projected Gradient-Type Algorithm for TDOA-Based Geolocation Under the Quasi-Parabolic Ionosphere Model
- Author
-
Sen Huang, Anthony Man-Cho So, Kehu Yang, and Yuen-Man Pun
- Subjects
Applied Mathematics ,Feasible region ,020206 networking & telecommunications ,02 engineering and technology ,Multilateration ,Critical point (mathematics) ,Nonlinear system ,Geolocation ,Iterated function ,Signal Processing ,Limit point ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,Gradient descent ,Algorithm - Abstract
The problem of geolocating an unknown high-frequency emitter based on the quasi-parabolic ionosphere model with time-difference of arrival measurements of the refracted radio rays is of fundamental importance in various military and civilian applications. Such a problem admits a maximum-likelihood (ML) formulation, which is nonlinear and non-convex. By elucidating the geometry of the feasible set of the ML formulation, we develop a first-order algorithm, which we call Generalized Projected Gradient Descent , to solve it. We prove that every limit point of the iterates generated by our proposed algorithm is a critical point of the ML formulation. Simulation results show that our proposed algorithm can more reliably and accurately geolocate the emitter than a state-of-the-art method in various settings.
- Published
- 2020
21. Nonconvex Robust Low-Rank Matrix Recovery
- Author
-
Zhihui Zhu, René Vidal, Xiao Li, and Anthony Man-Cho So
- Subjects
FOS: Computer and information sciences ,021103 operations research ,Computer Science - Information Theory ,Information Theory (cs.IT) ,0211 other engineering and technologies ,Low-rank approximation ,010103 numerical & computational mathematics ,02 engineering and technology ,01 natural sciences ,Theoretical Computer Science ,Matrix (mathematics) ,Outlier ,Applied mathematics ,0101 mathematics ,Subgradient method ,Software ,Mathematics - Abstract
In this paper we study the problem of recovering a low-rank matrix from a number of random linear measurements that are corrupted by outliers taking arbitrary values. We consider a nonsmooth nonconvex formulation of the problem, in which we explicitly enforce the low-rank property of the solution by using a factored representation of the matrix variable and employ an $\ell_1$-loss function to robustify the solution against outliers. We show that even when a constant fraction (which can be up to almost half) of the measurements are arbitrarily corrupted, as long as certain measurement operators arising from the measurement model satisfy the so-called $\ell_1/\ell_2$-restricted isometry property, the ground-truth matrix can be exactly recovered from any global minimum of the resulting optimization problem. Furthermore, we show that the objective function of the optimization problem is sharp and weakly convex. Consequently, a subgradient Method (SubGM) with geometrically diminishing step sizes will converge linearly to the ground-truth matrix when suitably initialized. We demonstrate the efficacy of the SubGM for the nonconvex robust low-rank matrix recovery problem with various numerical experiments.
- Published
- 2020
22. An Efficient Alternating Direction Method for Graph Learning from Smooth Signals
- Author
-
Anthony Man-Cho So, Haoyu Lei, Xiaolu Wang, and Chaorui Yao
- Subjects
Optimization problem ,Computer science ,Matrix norm ,020206 networking & telecommunications ,Topology (electrical circuits) ,010103 numerical & computational mathematics ,02 engineering and technology ,01 natural sciences ,Rate of convergence ,Convergence (routing) ,0202 electrical engineering, electronic engineering, information engineering ,Topological graph theory ,0101 mathematics ,Convex function ,Algorithm ,Connectivity - Abstract
We consider the problem of identifying the graph topology from a set of smooth graph signals. A well-known approach to this problem is minimizing the Dirichlet energy accompanied with some Frobenius norm regularization. Recent works have incorporated the logarithmic barrier on the node degrees to improve the overall graph connectivity without compromising graph sparsity, which is shown to be quite effective in enhancing the quality of the learned graphs. Although a primal-dual algorithm has been proposed in the literature to solve this type of graph learning formulations, it lacks a rigorous convergence analysis and appears to have a slow empirical performance. In this paper, we cast the graph learning formulation as a nonsmooth, strictly convex optimization problem and develop an efficient alternating direction method of multipliers to solve it. We show that our algorithm converges to the global minimum with arbitrary initialization. We conduct extensive experiments on various synthetic and real-world graphs, the results of which show that our method exhibits sharp linear convergence and is substantially faster than the commonly adopted primal-dual method.
- Published
- 2021
23. Nonconvex Optimization for Signal Processing and Machine Learning [From the Guest Editors]
- Author
-
Gesualdo Scutari, Anthony Man-Cho So, Wing-Kin Ma, and Prateek Jain
- Subjects
Signal processing ,Optimization algorithm ,business.industry ,Computer science ,Applied Mathematics ,Regular polygon ,Machine learning ,computer.software_genre ,Software deployment ,Signal Processing ,Convex optimization ,Special section ,Artificial intelligence ,Electrical and Electronic Engineering ,Focus (optics) ,business ,Convex function ,computer - Abstract
The articles in this special section focus on nonconvex optimization for signal processing and machine learning. Optimization is now widely recognized as an indispensable tool in signal processing (SP) and machine learning (ML). Indeed, many of the advances in these fields rely crucially on the formulation of suitable optimization models and deployment of efficient numerical optimization algorithms. In the early 2000s, there was a heavy focus on the use of convex optimization techniques to tackle SP and ML applications. This is largely due to the fact that convex optimization problems often possess favorable theoretical and computational properties and that many problems of practical interest have been shown to admit convex formulations or good convex approximations.
- Published
- 2020
24. Another Look at Anonymous Communication
- Author
-
Russell W. F. Lai, Sherman S. M. Chow, Anthony Man-Cho So, and Kam-Fung Cheung
- Subjects
Scheme (programming language) ,021110 strategic, defence & security studies ,Hierarchy ,Syntax (programming languages) ,business.industry ,Computer science ,0211 other engineering and technologies ,02 engineering and technology ,Cryptographic protocol ,Encryption ,Computer security ,computer.software_genre ,Public-key cryptography ,Electrical and Electronic Engineering ,business ,Communications protocol ,computer ,Anonymity ,computer.programming_language - Abstract
Anonymous communication is desirable for personal, financial, and political reasons. Despite the abundance of frameworks and constructions, anonymity definitions are usually either not well defined or too complicated to use. In between are ad-hoc definitions for specific protocols which sometimes only provide weakened anonymity guarantees. This paper addresses this situation from the perspectives of syntax, security definition, and construction. We propose simple yet expressive syntax and security definition for anonymous communication. Our syntax covers protocols with different operational characteristics. We give a hierarchy of anonymity definitions, starting from the strongest possible to several relaxations. We also propose a modular construction from any key-private public-key encryption scheme, and a new primitive—oblivious forwarding protocols, of which we give two constructions. The first is a generic construction from any random walk over graphs, while the second is optimized for the probability of successful delivery, with experimental validation for our optimization. Anonymity is guaranteed even when the adversary can observe and control all traffic in the network and corrupt most nodes, in contrast to some efficient yet not-so-anonymous protocols. We hope this work suggests an easier way to design and analyze efficient anonymous communication protocols in the future.
- Published
- 2019
25. On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition
- Author
-
Man-Chung Yue, Anthony Man-Cho So, and Zirui Zhou
- Subjects
021103 operations research ,0211 other engineering and technologies ,010103 numerical & computational mathematics ,02 engineering and technology ,01 natural sciences ,Regularization (mathematics) ,Theoretical Computer Science ,symbols.namesake ,Rate of convergence ,Optimization and Control (math.OC) ,FOS: Mathematics ,symbols ,Applied mathematics ,0101 mathematics ,Phase retrieval ,Mathematics - Optimization and Control ,Newton's method ,Software ,Mathematics - Abstract
In this paper we consider the cubic regularization (CR) method for minimizing a twice continuously differentiable function. While the CR method is widely recognized as a globally convergent variant of Newton's method with superior iteration complexity, existing results on its local quadratic convergence require a stringent non-degeneracy condition. We prove that under a local error bound (EB) condition, which is much weaker a requirement than the existing non-degeneracy condition, the sequence of iterates generated by the CR method converges at least Q-quadratically to a second-order critical point. This indicates that adding a cubic regularization not only equips Newton's method with remarkable global convergence properties but also enables it to converge quadratically even in the presence of degenerate solutions. As a byproduct, we show that without assuming convexity, the proposed EB condition is equivalent to a quadratic growth condition, which could be of independent interest. To demonstrate the usefulness and relevance of our convergence analysis, we focus on two concrete nonconvex optimization problems that arise in phase retrieval and low-rank matrix recovery, respectively, and prove that with overwhelming probability, the sequence of iterates generated by the CR method for solving these two problems converges at least Q-quadratically to a global minimizer. We also present numerical results of the CR method when applied to solve these two problems to support and complement our theoretical development.
- Published
- 2019
26. Technical Elements of Machine Learning for Intellectual Property Law
- Author
-
Anthony Man-Cho So
- Abstract
Recent advances in artificial intelligence (AI) technologies have transformed our lives in profound ways. Indeed, AI has not only enabled machines to see (eg, face recognition), hear (eg, music retrieval), speak (eg, speech synthesis), and read (eg, text processing), but also, so it seems, given machines the ability to think (eg, board game-playing) and create (eg, artwork generation). This chapter introduces the key technical elements of machine learning (ML), which is a rapidly growing sub-field in AI and drives many of the aforementioned applications. The goal is to elucidate the ways human efforts are involved in the development of ML solutions, so as to facilitate legal discussions on intellectual property issues.
- Published
- 2021
27. Low-Cost Lipschitz-Independent Adaptive Importance Sampling of Stochastic Gradients
- Author
-
Xiaolu Wang, Huikang Liu, Anthony Man-Cho So, and Jiajin Li
- Subjects
Mathematical optimization ,021103 operations research ,Uniform distribution (continuous) ,Adaptive sampling ,Artificial neural network ,Computer science ,business.industry ,0211 other engineering and technologies ,02 engineering and technology ,010501 environmental sciences ,Lipschitz continuity ,01 natural sciences ,Stochastic gradient descent ,Sampling distribution ,Convergence (routing) ,Artificial intelligence ,business ,Importance sampling ,0105 earth and related environmental sciences - Abstract
Stochastic gradient descent (SGD) usually samples training data based on the uniform distribution, which may not be a good choice because of the high variance of its stochastic gradient. Thus, importance sampling methods are considered in the literature to improve the performance. Most previous work on SGD-based methods with importance sampling requires the knowledge of Lipschitz constants of all component gradients, which are in general difficult to estimate. In this paper, we study an adaptive importance sampling method for common SGD-based methods by exploiting the local first-order information without knowing any Lipschitz constants. In particular, we periodically changes the sampling distribution by only utilizing the gradient norms in the past few iterations. We prove that our adaptive importance sampling non-asymptotically reduces the variance of the stochastic gradients in SGD, and thus better convergence bounds than that for vanilla SGD can be obtained. We extend this sampling method to several other widely used stochastic gradient algorithms including SGD with momentum and ADAM. Experiments on common convex learning problems and deep neural networks illustrate notably enhanced performance using the adaptive sampling strategy.
- Published
- 2021
28. Probabilistic Simplex Component Analysis
- Author
-
YUENING LI, Nicholas Sidiropoulos, Ruiyuan Wu, Wing-Kin Ma, and Anthony Man-Cho So
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Statistics - Machine Learning ,Signal Processing ,FOS: Electrical engineering, electronic engineering, information engineering ,Machine Learning (stat.ML) ,Electrical Engineering and Systems Science - Signal Processing ,Electrical and Electronic Engineering - Abstract
This study presents PRISM, a probabilistic simplex component analysis approach to identifying the vertices of a data-circumscribing simplex from data. The problem has a rich variety of applications, the most notable being hyperspectral unmixing in remote sensing and non-negative matrix factorization in machine learning. PRISM uses a simple probabilistic model, namely, uniform simplex data distribution and additive Gaussian noise, and it carries out inference by maximum likelihood. The inference model is sound in the sense that the vertices are provably identifiable under some assumptions, and it suggests that PRISM can be effective in combating noise when the number of data points is large. PRISM has strong, but hidden, relationships with simplex volume minimization, a powerful geometric approach for the same problem. We study these fundamental aspects, and we also consider algorithmic schemes based on importance sampling and variational inference. In particular, the variational inference scheme is shown to resemble a matrix factorization problem with a special regularizer, which draws an interesting connection to the matrix factorization approach. Numerical results are provided to demonstrate the potential of PRISM.
- Published
- 2021
- Full Text
- View/download PDF
29. Quartic Perturbation-based Outage-constrained Robust Design in Two-hop One-way Relay Networks
- Author
-
Wu, Sissi Xiaoxiao, Ni, Sherry Xue-Ying, Li, Jiaying, and So, Anthony Man-Cho
- Subjects
FOS: Computer and information sciences ,Computer Science - Information Theory ,Information Theory (cs.IT) ,ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS ,Data_CODINGANDINFORMATIONTHEORY ,Computer Science::Information Theory - Abstract
In this work, we study a classic robust design problem in two-hop one-way relay system. We are particularly interested in the scenario where channel uncertainty exists in both the transmitter-to-relay and relay-to-receiver links. By considering the problem design that minimizes the average amplify-and-forward power budget at the relay side while satisfying SNR outage requirements, an outage-constrained robust design problem involving quartic perturbations is formulated to guarantee the robustness during transmission. This problem is in general difficult as it involves constraints on the tail probability of a high-order polynomial. Herein, we resort to moment inequality and Bernstein-type inequality to tackle this problem, which provide convex restrictions, or safe approximations, of the original design. We also analyze the relative tightness of the two safe approximations for a quadratic perturbation-based outage constrained problem. Our analysis shows that the Bernstein-type inequality approach is less conservative than the moment inequality approach when the outage rate is within some prescribed regime. To our best knowledge, this is the first provable tightness result for these two safe approximations. Our numerical simulations verify the superiority of the robust design and corroborate the tightness results.
- Published
- 2021
- Full Text
- View/download PDF
30. Linear Convergence of a Proximal Alternating Minimization Method with Extrapolation for $\ell_1$-Norm Principal Component Analysis
- Author
-
Wang, Peng, Liu, Huikang, and So, Anthony Man-Cho
- Subjects
Optimization and Control (math.OC) ,FOS: Mathematics ,Mathematics - Optimization and Control - Abstract
A popular robust alternative of the classic principal component analysis (PCA) is the $\ell_1$-norm PCA (L1-PCA), which aims to find a subspace that captures the most variation in a dataset as measured by the $\ell_1$-norm. L1-PCA has shown great promise in alleviating the effect of outliers in data analytic applications. However, it gives rise to a challenging non-smooth non-convex optimization problem, for which existing algorithms are either not scalable or lack strong theoretical guarantees on their convergence behavior. In this paper, we propose a proximal alternating minimization method with extrapolation (PAMe) for solving a two-block reformulation of the L1-PCA problem. We then show that for both the L1-PCA problem and its two-block reformulation, the Kurdyka-\L ojasiewicz exponent at any of the limiting critical points is $1/2$. This allows us to establish the linear convergence of the sequence of iterates generated by PAMe and to determine the criticality of the limit of the sequence with respect to both the L1-PCA problem and its two-block reformulation. To complement our theoretical development, we show via numerical experiments on both synthetic and real-world datasets that PAMe is competitive with a host of existing methods. Our results not only significantly advance the convergence theory of iterative methods for L1-PCA but also demonstrate the potential of our proposed method in applications., Comment: A preliminary version of this work has appeared in the Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019)
- Published
- 2021
- Full Text
- View/download PDF
31. Dynamic Regret Bound for Moving Target Tracking Based on Online Time-of-Arrival Measurements
- Author
-
Anthony Man-Cho So and Yuen-Man Pun
- Subjects
Computer science ,Maximum likelihood ,010401 analytical chemistry ,020206 networking & telecommunications ,Regret ,02 engineering and technology ,01 natural sciences ,Convexity ,0104 chemical sciences ,Time of arrival ,0202 electrical engineering, electronic engineering, information engineering ,Trajectory ,Online algorithm ,Gradient descent ,Convex function ,Algorithm - Abstract
The use of online algorithms to track a moving target is gaining attention in the control community for its simpler and faster computation. In this work, we study the dynamic regret of online gradient descent (OGD) for tackling a time-of-arrival (TOA)-based least-squares formulation of the tracking problem. Since the formulation is non-convex, most existing dynamic regret analyses cannot be applied to it directly. To circumvent this difficulty, we proceed in two steps. First, we show that under standard assumptions on the TOA measurement noise, the loss function at each time step will, with high probability, be locally strongly convex at that time step. Moreover, we give an explicit estimate of the size of the strong convexity region. To the best of our knowledge, this result is new and can be of independent interest. Second, we show that under the aforementioned assumptions on the TOA measurement noise and mild assumptions on the target trajectory, the location estimate of the target at each time step will lie in the strong convexity region of the loss function at the next time step with high probability. This allows us to exploit existing analysis for online strongly convex optimization to give the first dynamic regret bound of OGD for the TOA-based target tracking problem. Simulation results are presented to illustrate our theoretical findings.
- Published
- 2020
32. Modelling, Simulation and Optimisation of Train Traffic with Passenger Movement (Fast Track)
- Author
-
Yeung Yam, Mallarajapattana Janardana Venkatarangan, and Anthony Man-Cho So
- Subjects
Computer science ,Movement (music) ,Modeling and Simulation ,Fast track ,Software ,Simulation - Published
- 2020
33. A Penalty Alternating Direction Method of Multipliers for Decentralized Composite Optimization
- Author
-
Qing Ling, Jiaojiao Zhang, and Anthony Man-Cho So
- Subjects
0209 industrial biotechnology ,Mathematical optimization ,Computer science ,Composite optimization ,Process (computing) ,Regular polygon ,020206 networking & telecommunications ,02 engineering and technology ,Function (mathematics) ,020901 industrial engineering & automation ,Approximation error ,0202 electrical engineering, electronic engineering, information engineering ,Penalty method ,Gradient descent ,Convex function - Abstract
This paper proposes a penalty alternating direction method of multipliers (ADMM) to minimize the summation of convex composite functions over a decentralized network. Each agent in the network holds a private function consisting of a smooth part and a nonsmooth part, and can only exchange information with its neighbors during the optimization process. We consider a penalized approximation of the decentralized optimization problem; but unlike the existing penalty methods, here the penalty parameter can be very small such that the approximation error is negligible. On the other hand, the small penalty parameter makes the penalized objective ill-conditioned, such that the popular proximal gradient descent method has to use a small step size, and is hence slow. To address this issue, we propose to solve the penalized formulation with ADMM. We further utilize the composite structures of the private functions through linearizing the smooth parts so as to reduce computational costs, and handling the nonsmooth parts with proximal operators. The proposed penalty ADMM (abbreviated as PAD) is provably convergent when the private functions are convex, and linearly convergent when the smooth parts are further strongly convex. Numerical experiments corroborate the theoretical analyses, and demonstrate the advantages of PAD over existing state-of-the-art algorithms, such as DL-ADMM, PG-EXTRA and NIDS.
- Published
- 2020
34. An Efficient Augmented Lagrangian-Based Method for Linear Equality-Constrained Lasso
- Author
-
Zengde Deng, Man-Chung Yue, and Anthony Man-Cho So
- Subjects
Mathematical optimization ,021103 operations research ,Augmented Lagrangian method ,Computer science ,MathematicsofComputing_NUMERICALANALYSIS ,0211 other engineering and technologies ,Duality (optimization) ,Feature selection ,02 engineering and technology ,01 natural sciences ,Regularization (mathematics) ,Convexity ,010104 statistics & probability ,Lasso (statistics) ,Linear regression ,Key (cryptography) ,0101 mathematics - Abstract
Variable selection is one of the most important tasks in statistics and machine learning. To incorporate more prior information about the regression coefficients, various constrained Lasso models have been proposed in the literature. Compared with the classic (unconstrained) Lasso model, the algorithmic aspects of constrained Lasso models are much less explored. In this paper, we demonstrate how the recently developed semis-mooth Newton-based augmented Lagrangian framework can be extended to solve a linear equality-constrained Lasso model. A key technical challenge that is not present in prior works is the lack of strong convexity in our dual problem, which we overcome by adopting a regularization strategy. We show that under mild assumptions, our proposed method will converge superlinearly. Moreover, extensive numerical experiments on both synthetic and real-world data show that our method can be substantially faster than existing first-order methods while achieving a better solution accuracy.
- Published
- 2020
35. A Fast Proximal Point Algorithm for Generalized Graph Laplacian Learning
- Author
-
Anthony Man-Cho So and Zengde Deng
- Subjects
Proximal point ,Linear programming ,Computer science ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,020206 networking & telecommunications ,010103 numerical & computational mathematics ,02 engineering and technology ,0101 mathematics ,Laplacian matrix ,01 natural sciences ,Algorithm ,Graph - Abstract
Graph learning is one of the most important tasks in machine learning, statistics and signal processing. In this paper, we focus on the problem of learning the generalized graph Lapla-cian (GGL) and propose an efficient algorithm to solve it. We first fully exploit the sparsity structure hidden in the objective function by utilizing soft-thresholding technique to transform the GGL problem into an equivalent problem. Moreover, we propose a fast proximal point algorithm (PPA) to solve the transformed GGL problem and establish the linear convergence rate of our algorithm. Extensive numerical experiments on both synthetic data and real data demonstrate that the soft-thresholding technique accelerates our PPA method and PPA can outperform the current state-of-the-art method in terms of speed.
- Published
- 2020
36. Understanding Notions of Stationarity in Non-Smooth Optimization
- Author
-
Li, Jiajin, So, Anthony Man-Cho, and Ma, Wing-Kin
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Optimization and Control (math.OC) ,FOS: Mathematics ,FOS: Electrical engineering, electronic engineering, information engineering ,Machine Learning (stat.ML) ,Electrical Engineering and Systems Science - Signal Processing ,Mathematics - Optimization and Control ,Machine Learning (cs.LG) - Abstract
Many contemporary applications in signal processing and machine learning give rise to structured non-convex non-smooth optimization problems that can often be tackled by simple iterative methods quite effectively. One of the keys to understanding such a phenomenon---and, in fact, one of the very difficult conundrums even for experts---lie in the study of "stationary points" of the problem in question. Unlike smooth optimization, for which the definition of a stationary point is rather standard, there is a myriad of definitions of stationarity in non-smooth optimization. In this article, we give an introduction to different stationarity concepts for several important classes of non-convex non-smooth functions and discuss the geometric interpretations and further clarify the relationship among these different concepts. We then demonstrate the relevance of these constructions in some representative applications and how they could affect the performance of iterative methods for tackling these applications., Comment: Accepted for publication in IEEE Signal Processing Magazine, 2020
- Published
- 2020
- Full Text
- View/download PDF
37. Boosting First-Order Methods by Shifting Objective: New Schemes with Faster Worst-Case Rates
- Author
-
Zhou, Kaiwen, So, Anthony Man-Cho, and Cheng, James
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
We propose a new methodology to design first-order methods for unconstrained strongly convex problems. Specifically, instead of tackling the original objective directly, we construct a shifted objective function that has the same minimizer as the original objective and encodes both the smoothness and strong convexity of the original objective in an interpolation condition. We then propose an algorithmic template for tackling the shifted objective, which can exploit such a condition. Following this template, we derive several new accelerated schemes for problems that are equipped with various first-order oracles and show that the interpolation condition allows us to vastly simplify and tighten the analysis of the derived methods. In particular, all the derived methods have faster worst-case convergence rates than their existing counterparts. Experiments on machine learning tasks are conducted to evaluate the new methods., Comment: NeurIPS 2020, 29 pages, 7 figures
- Published
- 2020
- Full Text
- View/download PDF
38. A Unified Approach to Synchronization Problems over Subgroups of the Orthogonal Group
- Author
-
Liu, Huikang, Yue, Man-Chung, and So, Anthony Man-Cho
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,History ,Polymers and Plastics ,Optimization and Control (math.OC) ,FOS: Mathematics ,FOS: Electrical engineering, electronic engineering, information engineering ,Electrical Engineering and Systems Science - Signal Processing ,Business and International Management ,Mathematics - Optimization and Control ,Industrial and Manufacturing Engineering ,Machine Learning (cs.LG) - Abstract
The problem of synchronization over a group $\mathcal{G}$ aims to estimate a collection of group elements $G^*_1, \dots, G^*_n \in \mathcal{G}$ based on noisy observations of a subset of all pairwise ratios of the form $G^*_i {G^*_j}^{-1}$. Such a problem has gained much attention recently and finds many applications across a wide range of scientific and engineering areas. In this paper, we consider the class of synchronization problems in which the group is a closed subgroup of the orthogonal group. This class covers many group synchronization problems that arise in practice. Our contribution is fivefold. First, we propose a unified approach for solving this class of group synchronization problems, which consists of a suitable initialization step and an iterative refinement step based on the generalized power method, and show that it enjoys a strong theoretical guarantee on the estimation error under certain assumptions on the group, measurement graph, noise, and initialization. Second, we formulate two geometric conditions that are required by our approach and show that they hold for various practically relevant subgroups of the orthogonal group. The conditions are closely related to the error-bound geometry of the subgroup -- an important notion in optimization. Third, we verify the assumptions on the measurement graph and noise for standard random graph and random matrix models. Fourth, based on the classic notion of metric entropy, we develop and analyze a novel spectral-type estimator. Finally, we show via extensive numerical experiments that our proposed non-convex approach outperforms existing approaches in terms of computational speed, scalability, and/or estimation error.
- Published
- 2020
- Full Text
- View/download PDF
39. Technical Elements of Machine Learning for Intellectual Property Law
- Author
-
Anthony Man-Cho So
- Subjects
Text processing ,Computer science ,business.industry ,Key (cryptography) ,Speech synthesis ,Artificial intelligence ,Intellectual property ,computer.software_genre ,business ,Machine learning ,computer ,Facial recognition system - Abstract
Recent advances in artificial intelligence (AI) technologies have transformed our lives in profound ways. Indeed, AI has not only enabled machines to see (e.g., face recognition), hear (e.g., music retrieval), speak (e.g., speech synthesis), and read (e.g., text processing), but also, so it seems, given machines the ability to think (e.g., board game-playing) and create (e.g., artwork generation). This chapter introduces the key technical elements of machine learning (ML), which is a rapidly growing sub-field in AI and drives many of the aforementioned applications. The goal is to elucidate the ways human efforts are involved in the development of ML solutions, so as to facilitate legal discussions on intellectual property issues.
- Published
- 2020
40. Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods
- Author
-
Huikang Liu, Anthony Man-Cho So, and Weijie Wu
- Subjects
021103 operations research ,Line search ,Optimization problem ,Iterative method ,General Mathematics ,Numerical analysis ,0211 other engineering and technologies ,010103 numerical & computational mathematics ,02 engineering and technology ,01 natural sciences ,Stiefel manifold ,Rate of convergence ,Exponent ,Applied mathematics ,Quadratic programming ,0101 mathematics ,Software ,Mathematics - Abstract
The problem of optimizing a quadratic form over an orthogonality constraint (QP-OC for short) is one of the most fundamental matrix optimization problems and arises in many applications. In this paper, we characterize the growth behavior of the objective function around the critical points of the QP-OC problem and demonstrate how such characterization can be used to obtain strong convergence rate results for iterative methods that exploit the manifold structure of the orthogonality constraint (i.e., the Stiefel manifold) to find a critical point of the problem. Specifically, our primary contribution is to show that the Łojasiewicz exponent at any critical point of the QP-OC problem is 1 / 2. Such a result is significant, as it expands the currently very limited repertoire of optimization problems for which the Łojasiewicz exponent is explicitly known. Moreover, it allows us to show, in a unified manner and for the first time, that a large family of retraction-based line-search methods will converge linearly to a critical point of the QP-OC problem. Then, as our secondary contribution, we propose a stochastic variance-reduced gradient (SVRG) method called Stiefel-SVRG for solving the QP-OC problem and present a novel Łojasiewicz inequality-based linear convergence analysis of the method. An important feature of Stiefel-SVRG is that it allows for general retractions and does not require the computation of any vector transport on the Stiefel manifold. As such, it is computationally more advantageous than other recently-proposed SVRG-type algorithms for manifold optimization.
- Published
- 2018
41. A family of inexact SQA methods for non-smooth convex minimization with provable convergence guarantees based on the Luo–Tseng error bound property
- Author
-
Man-Chung Yue, Anthony Man-Cho So, and Zirui Zhou
- Subjects
Sequence ,021103 operations research ,General Mathematics ,0211 other engineering and technologies ,Numerical Analysis (math.NA) ,010103 numerical & computational mathematics ,02 engineering and technology ,01 natural sciences ,Local convergence ,Quadratic equation ,Rate of convergence ,Optimization and Control (math.OC) ,Iterated function ,Convex optimization ,Convergence (routing) ,FOS: Mathematics ,Applied mathematics ,Mathematics - Numerical Analysis ,0101 mathematics ,Convex function ,Mathematics - Optimization and Control ,Software ,Mathematics - Abstract
We propose a new family of inexact sequential quadratic approximation (SQA) methods, which we call the inexact regularized proximal Newton (IRPN) method, for minimizing the sum of two closed proper convex functions, one of which is smooth and the other is possibly non-smooth. Our proposed method features strong convergence guarantees even when applied to problems with degenerate solutions while allowing the inner minimization to be solved inexactly. Specifically, we prove that when the problem possesses the so-called Luo–Tseng error bound (EB) property, IRPN converges globally to an optimal solution, and the local convergence rate of the sequence of iterates generated by IRPN is linear, superlinear, or even quadratic, depending on the choice of parameters of the algorithm. Prior to this work, such EB property has been extensively used to establish the linear convergence of various first-order methods. However, to the best of our knowledge, this work is the first to use the Luo–Tseng EB property to establish the superlinear convergence of SQA-type methods for non-smooth convex minimization. As a consequence of our result, IRPN is capable of solving regularized regression or classification problems under the high-dimensional setting with provable convergence guarantees. We compare our proposed IRPN with several empirically efficient algorithms by applying them to the $$\ell _1$$ -regularized logistic regression problem. Experiment results show the competitiveness of our proposed method.
- Published
- 2018
42. Distributionally Robust Collaborative Beamforming in D2D Relay Networks With Interference Constraints
- Author
-
Xiaoxia Huang, Anthony Man-Cho So, Shimin Gong, and Sissi Xiaoxiao Wu
- Subjects
Beamforming ,0209 industrial biotechnology ,Mathematical optimization ,Computer science ,Iterative method ,Gaussian ,Data_CODINGANDINFORMATIONTHEORY ,02 engineering and technology ,law.invention ,symbols.namesake ,020901 industrial engineering & automation ,Relay ,law ,Robustness (computer science) ,Computer Science::Networking and Internet Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Wireless ,Electrical and Electronic Engineering ,Computer Science::Information Theory ,business.industry ,Applied Mathematics ,ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS ,020206 networking & telecommunications ,Computer Science Applications ,symbols ,business ,Communication channel - Abstract
In this paper, we consider a device-to-device (D2D) network underlying a cellular system wherein the densely deployed D2D user devices can act as wireless relays for a distant transceiver pair. We aim to devise a beamforming strategy for the relays that maximizes the data rate of the distant transceiver while satisfying interference constraints at the cellular receivers. Towards that end, we first formulate a beamforming problem whose solution is robust against the channel uncertainties in the relay-destination hop. Motivated by practical observations, we assume that the random channels in this hop follow unimodal distributions and propose a novel unimodal distributionally robust model to capture the channel uncertainties. Then, we extend the formulation so that it can also guard against the channel uncertainty in the source-relay hop under the worst case robust model. The resulting robust beamforming problem is generally non-convex and intractable. Therefore, we design an iterative algorithm, which is based on solving semidefinite programs, to find an approximate solution to it. Simulation results show that under mild conditions, our robust model significantly improves the throughput of D2D relay transmissions when compared with the conventional robust models that merely rely on the channels’ moment information. It also outperforms the Bernstein-type inequality-based convex approximation, which assumes that the channel follows a Gaussian distribution.
- Published
- 2017
43. Unraveling the Rank-One Solution Mystery of Robust MISO Downlink Transmit Optimization: A Verifiable Sufficient Condition via a New Duality Result
- Author
-
Jiaxian Pan, Wing-Kin Ma, Anthony Man-Cho So, and Tsung-Hui Chang
- Subjects
FOS: Computer and information sciences ,Beamforming ,Mathematical optimization ,021103 operations research ,Optimization problem ,Information Theory (cs.IT) ,Computer Science - Information Theory ,0211 other engineering and technologies ,Duality (optimization) ,020206 networking & telecommunications ,Data_CODINGANDINFORMATIONTHEORY ,02 engineering and technology ,Minimax ,Precoding ,Channel state information ,Bounded function ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Verifiable secret sharing ,Electrical and Electronic Engineering ,Computer Science::Information Theory ,Mathematics - Abstract
This paper concentrates on a robust transmit optimization problem for the multiuser multi-input single-output (MISO) downlink scenario and under inaccurate channel state information (CSI). This robust problem deals with a general-rank transmit covariance design, and it follows a safe rate-constrained formulation under spherically bounded CSI uncertainties. Curiously, simulation results in previous works suggested that the robust problem admits rank-one optimal transmit covariances in most cases. Such a numerical finding is appealing because transmission with rank-one covariances can be easily realized by single-stream transmit beamforming. This gives rise to a fundamentally important question, namely, whether we can theoretically identify conditions under which the robust problem admits a rank-one solution. In this paper, we identify one such condition. Simply speaking, we show that the robust problem is guaranteed to admit a rank-one solution if the CSI uncertainties are not too large and the multiuser channel is not too poorly conditioned. To establish the aforementioned condition, we develop a novel duality framework, through which an intimate relationship between the robust problem and a related maximin problem is revealed. Our condition involves only a simple expression with respect to the multiuser channel and other system parameters. In particular, unlike other sufficient rank-one conditions that have appeared in the literature, ours is verifiable. The application of our analysis framework to several other CSI uncertainty models is also discussed., To appear in IEEE Trans. Signal Process. This version combines the main manuscript and its accompanied supplementary report into one single article
- Published
- 2017
44. Globally Convergent Accelerated Proximal Alternating Maximization Method for L1-Principal Component Analysis
- Author
-
Peng Wang, Anthony Man-Cho So, and Huikang Liu
- Subjects
021103 operations research ,Computer science ,Principal component analysis ,0211 other engineering and technologies ,0202 electrical engineering, electronic engineering, information engineering ,Extrapolation ,020206 networking & telecommunications ,02 engineering and technology ,Maximization ,Algorithm ,Matrix decomposition - Abstract
In this paper, we consider a l 1 -PCA problem under the large-scale data sample scenario, which has extensive applications in science and engineering. Previous algorithms for the problem either are not scalable or do not have good convergence guarantees. Our contribution is threefold. First, we develop a novel accelerated version of the proximal alternating maximization method to solve the l 1 -PCA problem. Second, by exploiting the Kurdyka-Łojasiewicz property of the problem, we show that our proposed method enjoys global convergence to a critical point, which improves upon existing convergence guarantees of other first-order methods for the l 1 -PCA problem. Third, we demonstrate via numerical experiments on both real-world and synthetic datasets that our proposed method is scalable and more efficient and accurate than other methods in the literature.
- Published
- 2019
45. Fast First-order Methods for the Massive Robust Multicast Beamforming Problem with Interference Temperature Constraints
- Author
-
Peng Wang, Anthony Man-Cho So, and Huikang Liu
- Subjects
Beamforming ,Mathematical optimization ,Linear programming ,Multicast ,Computer science ,020206 networking & telecommunications ,02 engineering and technology ,First order ,Interference (wave propagation) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Relaxation (approximation) ,Subgradient method ,Descent (mathematics) - Abstract
In this paper, we consider the large-scale case of the robust beamforming problem with interference temperature constraints. Previous semidefinite relaxation (SDR) method becomes impracticable because of its expensive computational cost. Even successive convex approximation (SCA) method, the state-of-the-art method, cannot tackle this problem efficiently. Thus, we are motivated to design two efficient first-order methods, multi-block alternating direction method of multipliers (ADMM) and linear programming-assisted subgradient descent (LPA-SD), to solve it. Numerical results demonstrate the potential of our proposed methods in terms of both computational efficiency and solution quality.
- Published
- 2019
46. Voting-Based Multi-Agent Reinforcement Learning for Intelligent IoT
- Author
-
Xu, Yue, Deng, Zengde, Wang, Mengdi, Xu, Wenjun, So, Anthony Man-Cho, and Cui, Shuguang
- Subjects
Computer Science::Multiagent Systems ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
The recent success of single-agent reinforcement learning (RL) in Internet of things (IoT) systems motivates the study of multi-agent reinforcement learning (MARL), which is more challenging but more useful in large-scale IoT. In this paper, we consider a voting-based MARL problem, in which the agents vote to make group decisions and the goal is to maximize the globally averaged returns. To this end, we formulate the MARL problem based on the linear programming form of the policy optimization problem and propose a distributed primal-dual algorithm to obtain the optimal solution. We also propose a voting mechanism through which the distributed learning achieves the same sublinear convergence rate as centralized learning. In other words, the distributed decision making does not slow down the process of achieving global consensus on optimality. Lastly, we verify the convergence of our proposed algorithm with numerical simulations and conduct case studies in practical multi-agent IoT systems., Comment: Published at IEEE Internet of Things Journal
- Published
- 2019
- Full Text
- View/download PDF
47. A Robust Design for MISO Physical-Layer Multicasting Over Line-of-Sight Channels
- Author
-
Anthony Man-Cho So, Sissi Xiaoxiao Wu, and Man-Chung Yue
- Subjects
FOS: Computer and information sciences ,Beamforming ,Multicast ,Computer science ,Computer Science - Information Theory ,Information Theory (cs.IT) ,Applied Mathematics ,Probabilistic logic ,020206 networking & telecommunications ,020302 automobile design & engineering ,02 engineering and technology ,Transmitter power output ,Power budget ,symbols.namesake ,0203 mechanical engineering ,Quadratic form ,Robustness (computer science) ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Taylor series ,symbols ,Electrical and Electronic Engineering ,Algorithm ,Communication channel - Abstract
This paper studies a robust design problem for far-field line-of-sight (LOS) channels where phase errors are present. Compared with the commonly used additive error model, the phase error model is more suitable for capturing the uncertainty in an LOS channel, as the dominant source of uncertainty lies in the phase. We consider a multiple-input single-output (MISO) multicast scenario, in which our goal is to design a beamformer that minimizes the transmit power while satisfying probabilistic signal-to-noise ratio (SNR) constraints. The probabilistic constraints give rise to a new computational challenge, as they involve random trigonometric forms. In this work, we propose to first approximate the random trigonometric form by its second-order Taylor expansion and then tackle the resulting random quadratic form using a Bernstein-type inequality. The advantage of such an approach is that an approximately optimal beamformer can be obtained using the standard semidefinite relaxation technique. In the simulations, we first show that if a non-robust design (i.e., one that does not take phase errors into account) is used, then the whole system may collapse. We then show that our proposed method is less conservative than the existing robust design based on Gaussian approximation and thus requires a lower power budget., Comment: This manuscript is submitted for possible journal publication on 13-Nov-2015
- Published
- 2016
48. Robust Convex Approximation Methods for TDOA-Based Localization Under NLOS Conditions
- Author
-
Youming Li, Anthony Man-Cho So, and Gang Wang
- Subjects
Mathematical optimization ,Noise measurement ,Computational complexity theory ,010401 analytical chemistry ,Robust optimization ,020206 networking & telecommunications ,02 engineering and technology ,Multilateration ,01 natural sciences ,0104 chemical sciences ,Non-line-of-sight propagation ,Robustness (computer science) ,Bounded function ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Measurement uncertainty ,Electrical and Electronic Engineering ,Mathematics - Abstract
In this paper, we develop a novel robust optimization approach to source localization using time-difference-of-arrival (TDOA) measurements that are collected under non-line-of-sight (NLOS) conditions. A key feature of our approach is that it does not require knowledge of the distribution or statistics of the NLOS errors, which are often difficult to obtain in practice. Instead, it only assumes that the NLOS errors have bounded supports. Based on this assumption, we formulate the TDOA-based source localization problem as a robust least squares (RLS) problem, in which a location estimate that is robust against the NLOS errors is sought. Since the RLS problem is non-convex, we propose two efficiently implementable convex relaxation-based approximation methods to tackle it. We then conduct a thorough theoretical analysis of the approximation quality and computational complexity of these two methods. In particular, we establish conditions under which they will yield a unique localization of the source. Simulation results on both synthetic and real data show that the performance of our approach under various NLOS settings is very stable and is significantly better than that of several existing non-robust approaches.
- Published
- 2016
49. A perturbation inequality for concave functions of singular values and its applications in low-rank matrix recovery
- Author
-
Man-Chung Yue and Anthony Man-Cho So
- Subjects
Concave function ,Applied Mathematics ,Perturbation (astronomy) ,020206 networking & telecommunications ,Low-rank approximation ,010103 numerical & computational mathematics ,02 engineering and technology ,01 natural sciences ,Combinatorics ,Singular value ,Matrix (mathematics) ,Compressed sensing ,Linear algebra ,0202 electrical engineering, electronic engineering, information engineering ,0101 mathematics ,Mathematics - Abstract
In this paper, we establish the following perturbation result concerning the singular values of a matrix: Let A , B ∈ R m × n be given matrices, and let f : R + → R + be a concave function satisfying f ( 0 ) = 0 . Then, we have ∑ i = 1 min { m , n } | f ( σ i ( A ) ) − f ( σ i ( B ) ) | ≤ ∑ i = 1 min { m , n } f ( σ i ( A − B ) ) , where σ i ( ⋅ ) denotes the i-th largest singular value of a matrix. This answers an open question that is of interest to both the compressive sensing and linear algebra communities. In particular, by taking f ( ⋅ ) = ( ⋅ ) p for any p ∈ ( 0 , 1 ] , we obtain a perturbation inequality for the so-called Schatten p-quasi-norm, which allows us to confirm the validity of a number of previously conjectured conditions for the recovery of low-rank matrices via the popular Schatten p-quasi-norm heuristic. We believe that our result will find further applications, especially in the study of low-rank matrix recovery.
- Published
- 2016
50. Low-Rank Semidefinite Programming: Theory and Applications
- Author
-
Yinyu Ye, Anthony Man-Cho So, and Alex Lemon
- Published
- 2016
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.