319 results on '"Lazaric, Alessandro"'
Search Results
52. Rotting bandits are not harder than stochastic ones
- Author
-
Seznec, Julien, Locatelli, Andrea, Carpentier, Alexandra, Lazaric, Alessandro, and Valko, Michal
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
In stochastic multi-armed bandits, the reward distribution of each arm is assumed to be stationary. This assumption is often violated in practice (e.g., in recommendation systems), where the reward of an arm may change whenever is selected, i.e., rested bandit setting. In this paper, we consider the non-parametric rotting bandit setting, where rewards can only decrease. We introduce the filtering on expanding window average (FEWA) algorithm that constructs moving averages of increasing windows to identify arms that are more likely to return high rewards when pulled once more. We prove that for an unknown horizon $T$, and without any knowledge on the decreasing behavior of the $K$ arms, FEWA achieves problem-dependent regret bound of $\widetilde{\mathcal{O}}(\log{(KT)}),$ and a problem-independent one of $\widetilde{\mathcal{O}}(\sqrt{KT})$. Our result substantially improves over the algorithm of Levine et al. (2017), which suffers regret $\widetilde{\mathcal{O}}(K^{1/3}T^{2/3})$. FEWA also matches known bounds for the stochastic bandit setting, thus showing that the rotting bandits are not harder. Finally, we report simulations confirming the theoretical improvements of FEWA.
- Published
- 2018
53. Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes
- Author
-
Fruit, Ronan, Pirotta, Matteo, and Lazaric, Alessandro
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
While designing the state space of an MDP, it is common to include states that are transient or not reachable by any policy (e.g., in mountain car, the product space of speed and position contains configurations that are not physically reachable). This leads to defining weakly-communicating or multi-chain MDPs. In this paper, we introduce \tucrl, the first algorithm able to perform efficient exploration-exploitation in any finite Markov Decision Process (MDP) without requiring any form of prior knowledge. In particular, for any MDP with $S^{\texttt{C}}$ communicating states, $A$ actions and $\Gamma^{\texttt{C}} \leq S^{\texttt{C}}$ possible communicating next states, we derive a $\widetilde{O}(D^{\texttt{C}} \sqrt{\Gamma^{\texttt{C}} S^{\texttt{C}} AT})$ regret bound, where $D^{\texttt{C}}$ is the diameter (i.e., the longest shortest path) of the communicating part of the MDP. This is in contrast with optimistic algorithms (e.g., UCRL, Optimistic PSRL) that suffer linear regret in weakly-communicating MDPs, as well as posterior sampling or regularised algorithms (e.g., REGAL), which require prior knowledge on the bias span of the optimal policy to bias the exploration to achieve sub-linear regret. We also prove that in weakly-communicating MDPs, no algorithm can ever achieve a logarithmic growth of the regret without first suffering a linear regret for a number of steps that is exponential in the parameters of the MDP. Finally, we report numerical simulations supporting our theoretical findings and showing how TUCRL overcomes the limitations of the state-of-the-art.
- Published
- 2018
54. Distributed Adaptive Sampling for Kernel Matrix Approximation
- Author
-
Calandriello, Daniele, Lazaric, Alessandro, and Valko, Michal
- Subjects
Statistics - Machine Learning ,Computer Science - Data Structures and Algorithms ,Computer Science - Learning - Abstract
Most kernel-based methods, such as kernel or Gaussian process regression, kernel PCA, ICA, or $k$-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix $\mathbf{K}_n$ requires at least $\mathcal{O}(n^2)$ time and space for $n$ samples. Recent works show that sampling points with replacement according to their ridge leverage scores (RLS) generates small dictionaries of relevant points with strong spectral approximation guarantees for $\mathbf{K}_n$. The drawback of RLS-based methods is that computing exact RLS requires constructing and storing the whole kernel matrix. In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that sequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_{eff}(\gamma)$ of the dataset. Moreover since all the RLS estimations are efficiently performed using only the small dictionary, SQUEAK is the first RLS sampling algorithm that never constructs the whole matrix $\mathbf{K}_n$, runs in linear time $\widetilde{\mathcal{O}}(nd_{eff}(\gamma)^3)$ w.r.t. $n$, and requires only a single pass over the dataset. We also propose a parallel and distributed version of SQUEAK that linearly scales across multiple machines, achieving similar accuracy in as little as $\widetilde{\mathcal{O}}(\log(n)d_{eff}(\gamma)^3)$ time., Comment: Presented at AISTATS 2017
- Published
- 2018
55. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning
- Author
-
Fruit, Ronan, Pirotta, Matteo, Lazaric, Alessandro, and Ortner, Ronald
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any unknown weakly-communicating Markov decision process (MDP) for which an upper bound $c$ on the span of the optimal bias function is known. For an MDP with $S$ states, $A$ actions and $\Gamma \leq S$ possible next states, we prove a regret bound of $\widetilde{O}(c\sqrt{\Gamma SAT})$, which significantly improves over existing algorithms (e.g., UCRL and PSRL), whose regret scales linearly with the MDP diameter $D$. In fact, the optimal bias span is finite and often much smaller than $D$ (e.g., $D=\infty$ in non-communicating MDPs). A similar result was originally derived by Bartlett and Tewari (2009) for REGAL.C, for which no tractable algorithm is available. In this paper, we relax the optimization problem at the core of REGAL.C, we carefully analyze its properties, and we provide the first computationally efficient algorithm to solve it. Finally, we report numerical simulations supporting our theoretical findings and showing how SCAL significantly outperforms UCRL in MDPs with large diameter and small span.
- Published
- 2018
56. Second-Order Kernel Online Convex Optimization with Adaptive Sketching
- Author
-
Calandriello, Daniele, Lazaric, Alessandro, and Valko, Michal
- Subjects
Statistics - Machine Learning ,Computer Science - Learning - Abstract
Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only $\mathcal{O}(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $\mathcal{O}(\sqrt{T})$ regret. Nonetheless, many common losses in kernel problems, such as squared loss, logistic loss, and squared hinge loss posses stronger curvature that can be exploited. In this case, second-order KOCO methods achieve $\mathcal{O}(\log(\text{Det}(\boldsymbol{K})))$ regret, which we show scales as $\mathcal{O}(d_{\text{eff}}\log T)$, where $d_{\text{eff}}$ is the effective dimension of the problem and is usually much smaller than $\mathcal{O}(\sqrt{T})$. The main drawback of second-order methods is their much higher $\mathcal{O}(t^2)$ space and time complexity. In this paper, we introduce kernel online Newton step (KONS), a new second-order KOCO method that also achieves $\mathcal{O}(d_{\text{eff}}\log T)$ regret. To address the computational complexity of second-order methods, we introduce a new matrix sketching algorithm for the kernel matrix $\boldsymbol{K}_t$, and show that for a chosen parameter $\gamma \leq 1$ our Sketched-KONS reduces the space and time complexity by a factor of $\gamma^2$ to $\mathcal{O}(t^2\gamma^2)$ space and time per iteration, while incurring only $1/\gamma$ times more regret.
- Published
- 2017
57. Experimental results : Reinforcement Learning of POMDPs using Spectral Methods
- Author
-
Azizzadenesheli, Kamyar, Lazaric, Alessandro, and Anandkumar, Animashree
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Learning ,Statistics - Machine Learning - Abstract
We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through epochs, in each epoch we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the epoch, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound with respect to the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces., Comment: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain
- Published
- 2017
58. Thompson Sampling for Linear-Quadratic Control Problems
- Author
-
Abeille, Marc and Lazaric, Alessandro
- Subjects
Statistics - Machine Learning - Abstract
We consider the exploration-exploitation tradeoff in linear quadratic (LQ) control problems, where the state dynamics is linear and the cost function is quadratic in states and controls. We analyze the regret of Thompson sampling (TS) (a.k.a. posterior-sampling for reinforcement learning) in the frequentist setting, i.e., when the parameters characterizing the LQ dynamics are fixed. Despite the empirical and theoretical success in a wide range of problems from multi-armed bandit to linear bandit, we show that when studying the frequentist regret TS in control problems, we need to trade-off the frequency of sampling optimistic parameters and the frequency of switches in the control policy. This results in an overall regret of $O(T^{2/3})$, which is significantly worse than the regret $O(\sqrt{T})$ achieved by the optimism-in-face-of-uncertainty algorithm in LQ control problems.
- Published
- 2017
59. Exploration--Exploitation in MDPs with Options
- Author
-
Fruit, Ronan and Lazaric, Alessandro
- Subjects
Computer Science - Learning ,Statistics - Machine Learning - Abstract
While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited. In this paper, we derive an upper and lower bound on the regret of a variant of UCRL using options. While we first analyze the algorithm in the general case of semi-Markov decision processes (SMDPs), we show how these results can be translated to the specific case of MDPs with options and we illustrate simple scenarios in which the regret of learning with options can be \textit{provably} much smaller than the regret suffered when learning with primitive actions.
- Published
- 2017
60. Active Learning for Accurate Estimation of Linear Models
- Author
-
Riquelme, Carlos, Ghavamzadeh, Mohammad, and Lazaric, Alessandro
- Subjects
Statistics - Machine Learning ,Computer Science - Learning - Abstract
We explore the sequential decision making problem where the goal is to estimate uniformly well a number of linear models, given a shared budget of random contexts independently sampled from a known distribution. The decision maker must query one of the linear models for each incoming context, and receives an observation corrupted by noise levels that are unknown, and depend on the model instance. We present Trace-UCB, an adaptive allocation algorithm that learns the noise levels while balancing contexts accordingly across the different linear functions, and derive guarantees for simple regret in both expectation and high-probability. Finally, we extend the algorithm and its guarantees to high dimensional settings, where the number of linear models times the dimension of the contextual space is higher than the total budget of samples. Simulations with real data suggest that Trace-UCB is remarkably robust, outperforming a number of baselines even when its assumptions are violated., Comment: 37 pages, 8 figures, International Conference on Machine Learning, ICML 2017
- Published
- 2017
61. Linear Thompson Sampling Revisited
- Author
-
Abeille, Marc and Lazaric, Alessandro
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
We derive an alternative proof for the regret of Thompson sampling (\ts) in the stochastic linear bandit setting. While we obtain a regret bound of order $\widetilde{O}(d^{3/2}\sqrt{T})$ as in previous results, the proof sheds new light on the functioning of the \ts. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to \textit{optimistic} parameters does control it. Thus we show that \ts can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrt{d}$ regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.
- Published
- 2016
- Full Text
- View/download PDF
62. Reinforcement Learning in Rich-Observation MDPs using Spectral Methods
- Author
-
Azizzadenesheli, Kamyar, Lazaric, Alessandro, and Anandkumar, Animashree
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Learning ,Statistics - Machine Learning - Abstract
Reinforcement learning (RL) in Markov decision processes (MDPs) with large state spaces is a challenging problem. The performance of standard RL algorithms degrades drastically with the dimensionality of state space. However, in practice, these large MDPs typically incorporate a latent or hidden low-dimensional structure. In this paper, we study the setting of rich-observation Markov decision processes (ROMDP), where there are a small number of hidden states which possess an injective mapping to the observation states. In other words, every observation state is generated through a single hidden state, and this mapping is unknown a priori. We introduce a spectral decomposition method that consistently learns this mapping, and more importantly, achieves it with low regret. The estimated mapping is integrated into an optimistic RL algorithm (UCRL), which operates on the estimated hidden space. We derive finite-time regret bounds for our algorithm with a weak dependence on the dimensionality of the observed space. In fact, our algorithm asymptotically achieves the same average regret as the oracle UCRL algorithm, which has the knowledge of the mapping from hidden to observed spaces. Thus, we derive an efficient spectral RL algorithm for ROMDPs.
- Published
- 2016
63. Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting
- Author
-
Calandriello, Daniele, Lazaric, Alessandro, and Valko, Michal
- Subjects
Statistics - Machine Learning ,Computer Science - Data Structures and Algorithms ,Computer Science - Learning - Abstract
We derive a new proof to show that the incremental resparsification algorithm proposed by Kelner and Levin (2013) produces a spectral sparsifier in high probability. We rigorously take into account the dependencies across subsequent resparsifications using martingale inequalities, fixing a flaw in the original analysis.
- Published
- 2016
64. Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies
- Author
-
Azizzadenesheli, Kamyar, Lazaric, Alessandro, and Anandkumar, Animashree
- Subjects
Computer Science - Artificial Intelligence - Abstract
Planning plays an important role in the broad class of decision theory. Planning has drawn much attention in recent work in the robotics and sequential decision making areas. Recently, Reinforcement Learning (RL), as an agent-environment interaction problem, has brought further attention to planning methods. Generally in RL, one can assume a generative model, e.g. graphical models, for the environment, and then the task for the RL agent is to learn the model parameters and find the optimal strategy based on these learnt parameters. Based on environment behavior, the agent can assume various types of generative models, e.g. Multi Armed Bandit for a static environment, or Markov Decision Process (MDP) for a dynamic environment. The advantage of these popular models is their simplicity, which results in tractable methods of learning the parameters and finding the optimal policy. The drawback of these models is again their simplicity: these models usually underfit and underestimate the actual environment behavior. For example, in robotics, the agent usually has noisy observations of the environment inner state and MDP is not a suitable model. More complex models like Partially Observable Markov Decision Process (POMDP) can compensate for this drawback. Fitting this model to the environment, where the partial observation is given to the agent, generally gives dramatic performance improvement, sometimes unbounded improvement, compared to MDP. In general, finding the optimal policy for the POMDP model is computationally intractable and fully non convex, even for the class of memoryless policies. The open problem is to come up with a method to find an exact or an approximate optimal stochastic memoryless policy for POMDP models., Comment: arXiv admin note: substantial text overlap with arXiv:1602.07764
- Published
- 2016
65. Reinforcement Learning of POMDPs using Spectral Methods
- Author
-
Azizzadenesheli, Kamyar, Lazaric, Alessandro, and Anandkumar, Animashree
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Learning ,Computer Science - Numerical Analysis ,Mathematics - Optimization and Control ,Statistics - Machine Learning - Abstract
We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through episodes, in each episode we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the episode, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound with respect to the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.
- Published
- 2016
66. Incremental Spectral Sparsification for Large-Scale Graph-Based Semi-Supervised Learning
- Author
-
Calandriello, Daniele, Lazaric, Alessandro, Valko, Michal, and Koutis, Ioannis
- Subjects
Statistics - Machine Learning ,Computer Science - Learning - Abstract
While the harmonic function solution performs well in many semi-supervised learning (SSL) tasks, it is known to scale poorly with the number of samples. Recent successful and scalable methods, such as the eigenfunction method focus on efficiently approximating the whole spectrum of the graph Laplacian constructed from the data. This is in contrast to various subsampling and quantization methods proposed in the past, which may fail in preserving the graph spectra. However, the impact of the approximation of the spectrum on the final generalization error is either unknown, or requires strong assumptions on the data. In this paper, we introduce Sparse-HFS, an efficient edge-sparsification algorithm for SSL. By constructing an edge-sparse and spectrally similar graph, we are able to leverage the approximation guarantees of spectral sparsification methods to bound the generalization error of Sparse-HFS. As a result, we obtain a theoretically-grounded approximation scheme for graph-based SSL that also empirically matches the performance of known large-scale methods.
- Published
- 2016
67. Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits
- Author
-
Carpentier, Alexandra, Lazaric, Alessandro, Ghavamzadeh, Mohammad, Munos, Rémi, Auer, Peter, and Antos, András
- Subjects
Computer Science - Learning ,G.3 - Abstract
In this paper, we study the problem of estimating uniformly well the mean values of several distributions given a finite budget of samples. If the variance of the distributions were known, one could design an optimal sampling strategy by collecting a number of independent samples per distribution that is proportional to their variance. However, in the more realistic case where the distributions are not known in advance, one needs to design adaptive sampling strategies in order to select which distribution to sample from according to the previously observed samples. We describe two strategies based on pulling the distributions a number of times that is proportional to a high-probability upper-confidence-bound on their variance (built from previous observed samples) and report a finite-sample performance analysis on the excess estimation error compared to the optimal allocation. We show that the performance of these allocation strategies depends not only on the variances but also on the full shape of the distributions., Comment: 30 pages, 2 Postscript figures, uses elsarticle.cls, earlier, shorter version published in Proceedings of the 22nd International Conference, Algorithmic Learning Theory
- Published
- 2015
68. Best-Arm Identification in Linear Bandits
- Author
-
Soare, Marta, Lazaric, Alessandro, and Munos, Rémi
- Subjects
Computer Science - Learning - Abstract
We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter $\theta^*$ and the objective is to return the arm with the largest reward. We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In particular, we show the importance of exploiting the global linear structure to improve the estimate of the reward of near-optimal arms. We analyze the proposed strategies and compare their empirical performance. Finally, as a by-product of our analysis, we point out the connection to the $G$-optimality criterion used in optimal experimental design., Comment: In Advances in Neural Information Processing Systems 27 (NIPS), 2014
- Published
- 2014
69. Truthful Learning Mechanisms for Multi-Slot Sponsored Search Auctions with Externalities
- Author
-
Gatti, Nicola, Lazaric, Alessandro, Rocco, Marco, and Trovò, Francesco
- Subjects
Computer Science - Computer Science and Game Theory - Abstract
Sponsored search auctions constitute one of the most successful applications of microeconomic mechanisms. In mechanism design, auctions are usually designed to incentivize advertisers to bid their truthful valuations and to assure both the advertisers and the auctioneer a non-negative utility. Nonetheless, in sponsored search auctions, the click-through-rates (CTRs) of the advertisers are often unknown to the auctioneer and thus standard truthful mechanisms cannot be directly applied and must be paired with an effective learning algorithm for the estimation of the CTRs. This introduces the critical problem of designing a learning mechanism able to estimate the CTRs at the same time as implementing a truthful mechanism with a revenue loss as small as possible compared to an optimal mechanism designed with the true CTRs. Previous work showed that, when dominant-strategy truthfulness is adopted, in single-slot auctions the problem can be solved using suitable exploration-exploitation mechanisms able to achieve a per-step regret (over the auctioneer's revenue) of order $O(T^{-1/3})$ (where T is the number of times the auction is repeated). It is also known that, when truthfulness in expectation is adopted, a per-step regret (over the social welfare) of order $O(T^{-1/2})$ can be obtained. In this paper we extend the results known in the literature to the case of multi-slot auctions. In this case, a model of the user is needed to characterize how the advertisers' valuations change over the slots. We adopt the cascade model that is the most famous model in the literature for sponsored search auctions. We prove a number of novel upper bounds and lower bounds both on the auctioneer's revenue loss and social welfare w.r.t. to the VCG auction and we report numerical simulations investigating the accuracy of the bounds in predicting the dependency of the regret on the auction parameters.
- Published
- 2014
70. Online Stochastic Optimization under Correlated Bandit Feedback
- Author
-
Azar, Mohammad Gheshlaghi, Lazaric, Alessandro, and Brunskill, Emma
- Subjects
Statistics - Machine Learning ,Computer Science - Learning ,Computer Science - Systems and Control - Abstract
In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel any-time $\mathcal{X}$-armed bandit algorithm, and derive regret bounds matching the performance of existing state-of-the-art in terms of dependency on number of steps and smoothness factor. The main advantage of HCT is that it handles the challenging case of correlated rewards, whereas existing methods require that the reward-generating process of each arm is an identically and independent distributed (iid) random process. HCT also improves on the state-of-the-art in terms of its memory requirement as well as requiring a weaker smoothness assumption on the mean-reward function in compare to the previous anytime algorithms. Finally, we discuss how HCT can be applied to the problem of policy search in reinforcement learning and we report preliminary empirical results.
- Published
- 2014
71. Sequential Transfer in Multi-armed Bandit with Finite Set of Models
- Author
-
Azar, Mohammad Gheshlaghi, Lazaric, Alessandro, and Brunskill, Emma
- Subjects
Statistics - Machine Learning ,Computer Science - Learning - Abstract
Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of \textit{sequential transfer in online learning}, notably in the multi-armed bandit framework, where the objective is to minimize the cumulative regret over a sequence of tasks by incrementally transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for the estimation of the possible tasks and derive regret bounds for it.
- Published
- 2013
72. Regret Bounds for Reinforcement Learning with Policy Advice
- Author
-
Azar, Mohammad Gheshlaghi, Lazaric, Alessandro, and Brunskill, Emma
- Subjects
Statistics - Machine Learning ,Computer Science - Learning - Abstract
In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with policy advice (RLPA) algorithm which leverages this input set and learns to use the best policy in the set for the reinforcement learning task at hand. We prove that RLPA has a sub-linear regret of \tilde O(\sqrt{T}) relative to the best input policy, and that both this regret and its computational complexity are independent of the size of the state and action space. Our empirical simulations support our theoretical analysis. This suggests RLPA may offer significant advantages in large domains where some prior good policies are provided.
- Published
- 2013
73. Risk-Aversion in Multi-armed Bandits
- Author
-
Sani, Amir, Lazaric, Alessandro, and Munos, Rémi
- Subjects
Computer Science - Learning - Abstract
Stochastic multi-armed bandits solve the Exploration-Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk-aversion where the objective is to compete against the arm with the best risk-return trade-off. This setting proves to be intrinsically more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we introduce two new algorithms, investigate their theoretical guarantees, and report preliminary empirical results., Comment: (2012)
- Published
- 2013
74. A Dantzig Selector Approach to Temporal Difference Learning
- Author
-
Geist, Matthieu, Scherrer, Bruno, Lazaric, Alessandro, and Ghavamzadeh, Mohammad
- Subjects
Computer Science - Learning ,Statistics - Machine Learning - Abstract
LSTD is a popular algorithm for value function approximation. Whenever the number of features is larger than the number of samples, it must be paired with some form of regularization. In particular, L1-regularization methods tend to perform feature selection by promoting sparsity, and thus, are well-suited for high-dimensional problems. However, since LSTD is not a simple regression algorithm, but it solves a fixed--point problem, its integration with L1-regularization is not straightforward and might come with some drawbacks (e.g., the P-matrix assumption for LASSO-TD). In this paper, we introduce a novel algorithm obtained by integrating LSTD with the Dantzig Selector. We investigate the performance of the proposed algorithm and its relationship with the existing regularized approaches, and show how it addresses some of their drawbacks., Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)
- Published
- 2012
75. Transfer from Multiple MDPs
- Author
-
Lazaric, Alessandro and Restelli, Marcello
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Learning - Abstract
Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms. A simple and effective approach is to transfer samples from source tasks and include them into the training set used to solve a given target task. In this paper, we investigate the theoretical properties of this transfer method and we introduce novel algorithms adapting the transfer process on the basis of the similarity between source and target tasks. Finally, we report illustrative experimental results in a continuous chain problem., Comment: 2011
- Published
- 2011
76. Truthful learning mechanisms for multi-slot sponsored search auctions with externalities
- Author
-
Gatti, Nicola, Lazaric, Alessandro, Rocco, Marco, and Trovò, Francesco
- Published
- 2015
- Full Text
- View/download PDF
77. Transfer in Reinforcement Learning: A Framework and a Survey
- Author
-
Lazaric, Alessandro, Wiering, Marco, editor, and van Otterlo, Martijn, editor
- Published
- 2012
- Full Text
- View/download PDF
78. Least-Squares Methods for Policy Iteration
- Author
-
Buşoniu, Lucian, Lazaric, Alessandro, Ghavamzadeh, Mohammad, Munos, Rémi, Babuška, Robert, De Schutter, Bart, Wiering, Marco, editor, and van Otterlo, Martijn, editor
- Published
- 2012
- Full Text
- View/download PDF
79. Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization
- Author
-
Hoffman, Matthew W., Lazaric, Alessandro, Ghavamzadeh, Mohammad, Munos, Rémi, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Sanner, Scott, editor, and Hutter, Marcus, editor
- Published
- 2012
- Full Text
- View/download PDF
80. Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits
- Author
-
Carpentier, Alexandra, Lazaric, Alessandro, Ghavamzadeh, Mohammad, Munos, Rémi, Auer, Peter, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Kivinen, Jyrki, editor, Szepesvári, Csaba, editor, Ukkonen, Esko, editor, and Zeugmann, Thomas, editor
- Published
- 2011
- Full Text
- View/download PDF
81. Towards Automated Bargaining in Electronic Markets: A Partially Two-Sided Competition Model
- Author
-
Gatti, Nicola, Lazaric, Alessandro, Restelli, Marcello, van der Aalst, Will, editor, Mylopoulos, John, editor, Sadeh, Norman M., editor, Shaw, Michael J., editor, Szyperski, Clemens, editor, Ketter, Wolfgang, editor, La Poutré, Han, editor, Sadeh, Norman, editor, Shehory, Onn, editor, and Walsh, William, editor
- Published
- 2010
- Full Text
- View/download PDF
82. Batch Reinforcement Learning for Controlling a Mobile Wheeled Pendulum Robot
- Author
-
Bonarini, Andrea, Caccia, Claudio, Lazaric, Alessandro, Restelli, Marcello, and Bramer, Max, editor
- Published
- 2008
- Full Text
- View/download PDF
83. Bifurcation Analysis of Reinforcement Learning Agents in the Selten’s Horse Game
- Author
-
Lazaric, Alessandro, Munoz de Cote, Enrique, Dercole, Fabio, Restelli, Marcello, Carbonell, Jaime G., editor, Siekmann, J\'org, editor, Tuyls, Karl, editor, Nowe, Ann, editor, Guessoum, Zahia, editor, and Kudenko, Daniel, editor
- Published
- 2008
- Full Text
- View/download PDF
84. Reinforcement Learning in Complex Environments Through Multiple Adaptive Partitions
- Author
-
Bonarini, Andrea, Lazaric, Alessandro, Restelli, Marcello, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Basili, Roberto, editor, and Pazienza, Maria Teresa, editor
- Published
- 2007
- Full Text
- View/download PDF
85. Incremental Skill Acquisition for Self-motivated Learning Animats
- Author
-
Bonarini, Andrea, Lazaric, Alessandro, Restelli, Marcello, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Nolfi, Stefano, editor, Baldassarre, Gianluca, editor, Calabretta, Raffaele, editor, Hallam, John C. T., editor, Marocco, Davide, editor, Meyer, Jean-Arcady, editor, Miglino, Orazio, editor, and Parisi, Domenico, editor
- Published
- 2006
- Full Text
- View/download PDF
86. Sketched Newton--Raphson
- Author
-
Yuan, Rui, primary, Lazaric, Alessandro, additional, and Gower, Robert M., additional
- Published
- 2022
- Full Text
- View/download PDF
87. Conservative and Greedy Approaches to Classification-Based Policy Iteration
- Author
-
Ghavamzadeh, Mohammad, primary and Lazaric, Alessandro, additional
- Published
- 2021
- Full Text
- View/download PDF
88. Reinforcement distribution in fuzzy Q-learning
- Author
-
Bonarini, Andrea, Lazaric, Alessandro, Montrone, Francesco, and Restelli, Marcello
- Published
- 2009
- Full Text
- View/download PDF
89. Sample complexity bounds for stochastic shortest path with a generative model
- Author
-
Tarbouriech, Jean, Pirotta, Matteo, Valko, Michal, Lazaric, Alessandro, Facebook AI Research [Paris] (FAIR), Facebook, Scool (Scool), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), DeepMind [Paris], and Valko, Michal
- Subjects
sample complexity ,stochastic shortest path ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Markov decision process ,[STAT.ML] Statistics [stat]/Machine Learning [stat.ML] - Abstract
International audience; We consider the objective of computing an ε-optimal policy in a stochastic shortest path (SSP) setting, provided that we can access a generative sampling oracle. We propose two algorithms for this setting and derive PAC bounds on their sample complexity: one for the case of positive costs and the other for the case of non-negative costs under a restricted optimality criterion. While tight sample complexity bounds have been derived for the finite-horizon and discounted MDPs, the SSP problem is a strict generalization of these settings and it poses additional technical challenges due to the fact that no specific time horizon is prescribed and policies may never terminate, i.e., we are possibly facing non-proper policies. As a consequence, we can neither directly apply existing techniques minimizing sample complexity nor rely on a regret-to-PAC conversion leveraging recent regret bounds for SSP. Our analysis instead combines SSP-specific tools and variance reduction techniques to obtain the first sample complexity bounds for this setting.
- Published
- 2021
90. SKETCHED NEWTON RAPHSON.
- Author
-
RUI YUAN, LAZARIC, ALESSANDRO, and GOWER, ROBERT M.
- Subjects
- *
NONLINEAR equations , *NEWTON-Raphson method , *STOCHASTIC convergence - Abstract
We propose a new globally convergent stochastic second-order method. Our starting point is the development of a new sketched Newton-Raphson (SNR) method for solving large scale nonlinear equations of the form F(x) = 0 with F : ℝp → ℝm. We then show how to design several stochastic second-order optimization methods by rewriting the optimization problem of interest as a system of nonlinear equations and applying SNR. For instance, by applying SNR to find a stationary point of a generalized linear model, we derive completely new and scalable stochastic second-order methods. We show that the resulting method is very competitive as compared to state-of-the-art variance reduced methods. Furthermore, using a variable splitting trick, we also show that the stochastic Newton method (SNM) is a special case of SNR and use this connection to establish the first global convergence theory of SNM. We establish the global convergence of SNR by showing that it is a variant of the online stochastic gradient descent (SGD) method, and then leveraging proof techniques of SGD. As a special case, our theory also provides a new global convergence theory for the original Newton-Raphson method under strictly weaker assumptions as compared to the classic monotone convergence theory. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
91. No-regret exploration in goal-oriented reinforcement learning
- Author
-
Tarbouriech, Jean, Garcelon, Evrard, Valko, Michal, Pirotta, Matteo, Lazaric, Alessandro, Scool (Scool), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Facebook AI Research [Paris] (FAIR), Facebook, DeepMind [Paris], and Valko, Michal
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,[STAT.ML] Statistics [stat]/Machine Learning [stat.ML] ,Machine Learning (cs.LG) - Abstract
Many popular reinforcement learning problems (e.g., navigation in a maze, some Atari games, mountain car) are instances of the episodic setting under its stochastic shortest path (SSP) formulation, where an agent has to achieve a goal state while minimizing the cumulative cost. Despite the popularity of this setting, the exploration-exploitation dilemma has been sparsely studied in general SSP problems, with most of the theoretical literature focusing on different problems (i.e., fixed-horizon and infinite-horizon) or making the restrictive loop-free SSP assumption (i.e., no state can be visited twice during an episode). In this paper, we study the general SSP problem with no assumption on its dynamics (some policies may actually never reach the goal). We introduce UC-SSP, the first no-regret algorithm in this setting, and prove a regret bound scaling as $\displaystyle \widetilde{\mathcal{O}}( D S \sqrt{ A D K})$ after $K$ episodes for any unknown SSP with $S$ states, $A$ actions, positive costs and SSP-diameter $D$, defined as the smallest expected hitting time from any starting state to the goal. We achieve this result by crafting a novel stopping rule, such that UC-SSP may interrupt the current policy if it is taking too long to achieve the goal and switch to alternative policies that are designed to rapidly terminate the episode.
- Published
- 2020
92. Reward-free exploration beyond finite-horizon
- Author
-
Tarbouriech, Jean, Pirotta, Matteo, Valko, Michal, Lazaric, Alessandro, Valko, Michal, Facebook AI Research [Paris] (FAIR), Facebook, Scool (Scool), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), and DeepMind [Paris]
- Subjects
[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,[STAT.ML] Statistics [stat]/Machine Learning [stat.ML] - Abstract
International audience; We consider the reward-free exploration framework introduced by Jin et al. (2020), where an RL agent interacts with an unknown environment without any explicit reward function to maximize. The objective is to collect enough information during the exploration phase, so that a near-optimal policy can be immediately computed once any reward function is provided. In this paper, we move from the finite-horizon setting studied by Jin et al. (2020) to the more general setting of goalconditioned RL, often referred to as stochastic shortest path (SSP). We first discuss the challenges specific to SSPs and then study two scenarios: 1) reward-free goal-free exploration in communicating MDPs, and 2) reward-free goal-free incremental exploration in non-communicating MDPs where the agent is provided with a reset action to an initial state. In both cases, we provide exploration algorithms and their samplecomplexity bounds which we contrast with the existing guarantees in the finite-horizon case. 1
- Published
- 2020
93. Regret Bounds for Reinforcement Learning with Policy Advice
- Author
-
Azar, Mohammad Gheshlaghi, primary, Lazaric, Alessandro, additional, and Brunskill, Emma, additional
- Published
- 2013
- Full Text
- View/download PDF
94. Towards Automated Bargaining in Electronic Markets: A Partially Two-Sided Competition Model
- Author
-
Gatti, Nicola, primary, Lazaric, Alessandro, additional, and Restelli, Marcello, additional
- Published
- 2010
- Full Text
- View/download PDF
95. Improved Algorithms for Conservative Exploration in Bandits
- Author
-
Garcelon, Evrard, primary, Ghavamzadeh, Mohammad, additional, Lazaric, Alessandro, additional, and Pirotta, Matteo, additional
- Published
- 2020
- Full Text
- View/download PDF
96. Incremental Skill Acquisition for Self-motivated Learning Animats
- Author
-
Bonarini, Andrea, primary, Lazaric, Alessandro, additional, and Restelli, Marcello, additional
- Published
- 2006
- Full Text
- View/download PDF
97. Rotting bandits are not harder than stochastic ones
- Author
-
Seznec, Julien, Locatelli, Andrea, Carpentier, Alexandra, Lazaric, Alessandro, Valko, Michal, Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Lelivrescolaire.fr, Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg] (OVGU), Facebook AI Research [Paris] (FAIR), Facebook, DeepMind [Paris], ANR-16-CE23-0003,BoB,Inférence bayésienne à ressources limitées - données massives et modèles coûteux(2016), European Project, Otto-von-Guericke University [Magdeburg] (OVGU), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189 (CRIStAL), Ecole Centrale de Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Ecole Centrale de Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), and ANR-16-CE23-0003,BoB,Bayes on a Budget - big data and expensive models(2016)
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
International audience; In bandits, arms' distributions are stationary. This is often violated in practice, where rewards change over time. In applications as recommendation systems, online advertising, and crowdsourcing, the changes may be triggered by the pulls, so that the arms' rewards change as a function of the number of pulls. In this paper, we consider the specific case of non-parametric rotting bandits, where the expected reward of an arm may decrease every time it is pulled. We introduce the filtering on expanding window average (FEWA) algorithm that at each round constructs moving averages of increasing windows to identify arms that are more likely to return high rewards when pulled once more. We prove that, without any knowledge on the decreasing behavior of the arms, FEWA achieves similar anytime problem-dependent, O(log(KT)), and problem-independent, O(sqrtKT), regret bounds of near-optimal stochastic algorithms as UCB1 of Auer et al. (2002a). This result substantially improves the prior result of Levine et al. (2017) which needed knowledge of the horizon and decaying parameters to achieve problem-independent bound of only O(K1/3T2/3). Finally, we report simulations confirming the theoretical improvements of FEWA.
- Published
- 2019
98. Fighting Boredom in Recommender Systems with Linear Reinforcement Learning
- Author
-
Warlop, Romain, Lazaric, Alessandro, Mary, Jérémie, Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Facebook, and Criteo [Paris]
- Subjects
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] - Abstract
International audience; A common assumption in recommender systems (RS) is the existence of a best fixed recommendation strategy. Such strategy may be simple and work at the item level (e.g., in multi-armed bandit it is assumed one best fixed arm/item exists) or implement more sophisticated RS (e.g., the objective of A/B testing is to find the best fixed RS and execute it thereafter). We argue that this assumption is rarely verified in practice, as the recommendation process itself may impact the user's preferences. For instance, a user may get bored by a strategy, while she may gain interest again, if enough time passed since the last time that strategy was used. In this case, a better approach consists in alternating different solutions at the right frequency to fully exploit their potential. In this paper, we first cast the problem as a Markov decision process, where the rewards are a linear function of the recent history of actions, and we show that a policy considering the long-term influence of the recommendations may outperform both fixed-action and contextual greedy policies. We then introduce an extension of the UCRL algorithm (LINUCRL) to effectively balance exploration and exploitation in an unknown environment, and we derive a regret bound that is independent of the number of states. Finally, we empirically validate the model assumptions and the algorithm in a number of realistic scenarios.
- Published
- 2018
99. Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems
- Author
-
Abeille, Marc, Lazaric, Alessandro, Abeille, Marc, Criteo AI Lab, Criteo [Paris], Facebook AI Research [Paris] (FAIR), and Facebook
- Subjects
[STAT.OT]Statistics [stat]/Other Statistics [stat.ML] ,[STAT.OT] Statistics [stat]/Other Statistics [stat.ML] - Abstract
International audience; Thompson sampling (TS) is an effective approach to trade off exploration and exploration in reinforcement learning. Despite its empirical success and recent advances, its theoretical analysis is often limited to the Bayesian setting, finite state-action spaces, or finite-horizon problems. In this paper, we study an instance of TS in the challenging setting of the infinite-horizon linear quadratic (LQ) control, which models problems with continuous state-action variables, linear dynamics, and quadratic cost. In particular, we analyze the regret in the frequentist sense (i.e., for a fixed unknown environment) in one-dimensional systems. We derive the first O(√ T) frequentist regret bound for this problem, thus significantly improving the O(T 2/3) bound of Abeille & Lazaric (2017) and matching the frequentist performance derived by Abbasi-Yadkori & Szepesvári (2011) for an optimistic approach and the Bayesian result of Ouyang et al. (2017). We obtain this result by developing a novel bound on the regret due to policy switches, which holds for LQ systems of any dimensionality and it allows updating the parameters and the policy at each step, thus overcoming previous limitations due to lazy updates. Finally, we report numerical simulations supporting the conjecture that our result extends to multi-dimensional systems.
- Published
- 2018
100. Improved large-scale graph learning through ridge spectral sparsification
- Author
-
Calandriello, Daniele, Koutis, Ioannis, Lazaric, Alessandro, Valko, Michal, Sequential Learning ( SEQUEL ), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189 ( CRIStAL ), Ecole Centrale de Lille-Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut Mines-Télécom [Paris]-Université de Lille-Centre National de la Recherche Scientifique ( CNRS ) -Ecole Centrale de Lille-Institut Mines-Télécom [Paris]-Université de Lille-Centre National de la Recherche Scientifique ( CNRS ), Istituto Italiano di Tecnologia ( IIT ), New Jersey Institute of Technology [Newark] ( NJIT ), Facebook AI Research [Paris] ( FAIR ), Facebook, Sequential Learning (SEQUEL), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Istituto Italiano di Tecnologia (IIT), New Jersey Institute of Technology [Newark] (NJIT), Facebook AI Research [Paris] (FAIR), and ANR-16-CE23-0003,BoB,Inférence bayésienne à ressources limitées - données massives et modèles coûteux(2016)
- Subjects
[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,[ STAT.ML ] Statistics [stat]/Machine Learning [stat.ML] ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
International audience; The representation and learning benefits of methods based on graph Laplacians, such as Lapla-cian smoothing or harmonic function solution for semi-supervised learning (SSL), are empirically and theoretically well supported. Nonetheless , the exact versions of these methods scale poorly with the number of nodes n of the graph. In this paper, we combine a spectral sparsifica-tion routine with Laplacian learning. Given a graph G as input, our algorithm computes a sparsi-fier in a distributed way in O(n log 3 (n)) time, O(m log 3 (n)) work and O(n log(n)) memory, using only log(n) rounds of communication. Furthermore , motivated by the regularization often employed in learning algorithms, we show that constructing sparsifiers that preserve the spectrum of the Laplacian only up to the regularization level may drastically reduce the size of the final graph. By constructing a spectrally-similar graph, we are able to bound the error induced by the sparsifica-tion for a variety of downstream tasks (e.g., SSL). We empirically validate the theoretical guarantees on Amazon co-purchase graph and compare to the state-of-the-art heuristics.
- Published
- 2018
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.