Author: "Mannor, Shie" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Mannor, Shie"' showing total 1,003 results

Start Over Author "Mannor, Shie"

1,003 results on '"Mannor, Shie"'

201. Clustered Bandits

Author: Bui, Loc, Johari, Ramesh, and Mannor, Shie
Subjects: Computer Science - Learning
Abstract: We consider a multi-armed bandit setting that is inspired by real-world applications in e-commerce. In our setting, there are a few types of users, each with a specific response to the different arms. When a user enters the system, his type is unknown to the decision maker. The decision maker can either treat each user separately ignoring the previously observed users, or can attempt to take advantage of knowing that only few types exist and cluster the users according to their response to the arms. We devise algorithms that combine the usual exploration-exploitation tradeoff with clustering of users and demonstrate the value of clustering. In the process of developing algorithms for the clustered setting, we propose and analyze simple algorithms for the setup where a decision maker knows that a user belongs to one of few types, but does not know which one., Comment: 19 pages, 2 figures, under review
Published: 2012

202. Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty

Author: Mannor, Shie, Mebel, Ofir, and Xu, Huan
Subjects: Computer Science - Learning, Computer Science - Computer Science and Game Theory, Computer Science - Systems and Control
Abstract: We consider Markov decision processes under parameter uncertainty. Previous studies all restrict to the case that uncertainties among different states are uncoupled, which leads to conservative solutions. In contrast, we introduce an intuitive concept, termed "Lightning Does not Strike Twice," to model coupled uncertain parameters. Specifically, we require that the system can deviate from its nominal parameters only a bounded number of times. We give probabilistic guarantees indicating that this model represents real life situations and devise tractable algorithms for computing optimal control policies using this concept., Comment: ICML2012
Published: 2012

203. Decoupling Exploration and Exploitation in Multi-Armed Bandits

Author: Avner, Orly, Mannor, Shie, and Shamir, Ohad
Subjects: Computer Science - Learning
Abstract: We consider a multi-armed bandit problem where the decision maker can explore and exploit different arms at every round. The exploited arm adds to the decision maker's cumulative reward (without necessarily observing the reward) while the explored arm reveals its value. We devise algorithms for this setup and show that the dependence on the number of arms, k, can be much better than the standard square root of k dependence, depending on the behavior of the arms' reward sequences. For the important case of piecewise stationary stochastic bandits, we show a significant improvement over existing algorithms. Our algorithms are based on a non-uniform sampling policy, which we show is essential to the success of any algorithm in the adversarial setup. Finally, we show some simulation results on an ultra-wide band channel selection inspired setting indicating the applicability of our algorithms., Comment: Full version of the paper presented at ICML 2012
Published: 2012

204. Relaxed Half-Stochastic Belief Propagation

Author: Leduc-Primeau, François, Hemati, Saied, Mannor, Shie, and Gross, Warren J.
Subjects: Computer Science - Hardware Architecture
Abstract: Low-density parity-check codes are attractive for high throughput applications because of their low decoding complexity per bit, but also because all the codeword bits can be decoded in parallel. However, achieving this in a circuit implementation is complicated by the number of wires required to exchange messages between processing nodes. Decoding algorithms that exchange binary messages are interesting for fully-parallel implementations because they can reduce the number and the length of the wires, and increase logic density. This paper introduces the Relaxed Half-Stochastic (RHS) decoding algorithm, a binary message belief propagation (BP) algorithm that achieves a coding gain comparable to the best known BP algorithms that use real-valued messages. We derive the RHS algorithm by starting from the well-known Sum-Product algorithm, and then derive a low-complexity version suitable for circuit implementation. We present extensive simulation results on two standardized codes having different rates and constructions, including low bit error rate results. These simulations show that RHS can be an advantageous replacement for the existing state-of-the-art decoding algorithms when targeting fully-parallel implementations.
Published: 2012

205. Go Viral, or Not: Rate-Optimal Control for Resource-Constrained Branching Processes

Author: Mannor, Shie and Xu, Kuang
Subjects: Mathematics - Optimization and Control, Mathematics - Probability
Abstract: We propose and analyze a new class of controlled multi-type branching processes with a per-step linear resource constraint, motivated by potential applications in viral marketing and cancer treatment. We show that the optimal exponential growth rate of the population can be achieved by maintaining a fixed proportion among the species, for both deterministic and stochastic branching processes. In the special case of a two-type population and with a symmetric reward structure, the optimal proportion is obtained in closed-form. In addition to revealing structural properties of controlled branching processes, our results are intended to provide the practitioners with an easy-to-interpret benchmark for best practices, if not exact policies. As a proof of concept, the methodology is applied to the linkage structure of the 2004 US Presidential Election blogosphere, where the optimal growth rate demonstrates sizable gains over a uniform selection strategy, and to a two-compartment cell-cycle kinetics model for cancer growth, with realistic parameters, where the robust estimate for minimal treatment intensity under a worst-case growth rate is noticeably more conservative compared to that obtained using more optimistic assumptions.
Published: 2012

206. Regulation, Volatility and Efficiency in Continuous-Time Markets

Author: Kizilkale, Arman C. and Mannor, Shie
Subjects: Computer Science - Systems and Control, Mathematics - Optimization and Control
Abstract: We analyze the efficiency of markets with friction, particularly power markets. We model the market as a dynamic system with $(d_t;\,t\geq 0)$ the demand process and $(s_t;\,t\geq 0)$ the supply process. Using stochastic differential equations to model the dynamics with friction, we investigate the efficiency of the market under an integrated expected undiscounted cost function solving the optimal control problem. Then, we extend the setup to a game theoretic model where multiple suppliers and consumers interact continuously by setting prices in a dynamic market with friction. We investigate the equilibrium, and analyze the efficiency of the market under an integrated expected social cost function. We provide an intriguing efficiency-volatility no-free-lunch trade-off theorem.
Published: 2011

207. Bandits with an Edge

Author: Di Castro, Dotan, Gentile, Claudio, and Mannor, Shie
Subjects: Computer Science - Learning
Abstract: We consider a bandit problem over a graph where the rewards are not directly observed. Instead, the decision maker can compare two nodes and receive (stochastic) information pertaining to the difference in their value. The graph structure describes the set of possible comparisons. Consequently, comparing between two nodes that are relatively far requires estimating the difference between every pair of nodes on the path between them. We analyze this problem from the perspective of sample complexity: How many queries are needed to find an approximately optimal node with probability more than $1-\delta$ in the PAC setup? We show that the topology of the graph plays a crucial in defining the sample complexity: graphs with a low diameter have a much better sample complexity.
Published: 2011

208. From Bandits to Experts: On the Value of Side-Observations

Author: Mannor, Shie and Shamir, Ohad
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game. In addition to observing the reward of the chosen action, the decision maker gets side observations on the reward he would have obtained had he chosen some of the other actions. The observation structure is encoded as a graph, where node i is linked to node j if sampling i provides information on the reward of j. This setting naturally interpolates between the well-known "experts" setting, where the decision maker can view all rewards, and the multi-armed bandits setting, where the decision maker can only view the reward of the chosen action. We develop practical algorithms with provable regret guarantees, which depend on non-trivial graph-theoretic properties of the information feedback structure. We also provide partially-matching lower bounds., Comment: Presented at the NIPS 2011 conference
Published: 2011

209. Robust approachability and regret minimization in games with partial monitoring

Author: Mannor, Shie, Perchet, Vianney, and Stoltz, Gilles
Subjects: Mathematics - Statistics Theory, Computer Science - Learning
Abstract: Approachability has become a standard tool in analyzing earning algorithms in the adversarial online learning setup. We develop a variant of approachability for games where there is ambiguity in the obtained reward that belongs to a set, rather than being a single vector. Using this variant we tackle the problem of approachability in games with partial monitoring and develop simple and efficient algorithms (i.e., with constant per-step complexity) for this setup. We finally consider external regret and internal regret in repeated games with partial monitoring and derive regret-minimizing strategies based on approachability theory.
Published: 2011

210. A Maximal Large Deviation Inequality for Sub-Gaussian Variables

Author: Di Castro, Dotan, Gentile, Claudio, and Mannor, Shie
Subjects: Computer Science - Learning
Abstract: In this short note we prove a maximal concentration lemma for sub-Gaussian random variables stating that for independent sub-Gaussian random variables we have \[P<(\max_{1\le i\le N}S_{i}>\epsilon>) \le\exp<(-\frac{1}{N^2}\sum_{i=1}^{N}\frac{\epsilon^{2}}{2\sigma_{i}^{2}}>), \] where $S_i$ is the sum of $i$ zero mean independent sub-Gaussian random variables and $\sigma_i$ is the variance of the $i$th random variable., Comment: This paper has been withdrawn by the authors due to a crucial error in the last sentence of the proof of Theorem 1: "we can take the infimum of the r.h.s. over s, which yields (1)." This statement is only true if a single value of s yields the supremum of (\epsilon_i s - \rho_i(s)) simultaneously for every i
Published: 2011

211. Mean-Variance Optimization in Markov Decision Processes

Author: Mannor, Shie and Tsitsiklis, John
Subjects: Computer Science - Learning, Computer Science - Artificial Intelligence
Abstract: We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudopolynomial exact and approximation algorithms., Comment: A full version of an ICML 2011 paper
Published: 2011

212. The Sample Complexity of Dictionary Learning

Author: Vainsencher, Daniel, Mannor, Shie, and Bruckstein, Alfred M.
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a set of signals to be represented. Can we expect that the representation found by such a dictionary for a previously unseen example from the same source will have L_2 error of the same magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study this questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected L_2 error in representation when the dictionary is used. For the case of l_1 regularized coefficient selection we provide a generalization bound of the order of O(sqrt(np log(m lambda)/m)), where n is the dimension, p is the number of elements in the dictionary, lambda is a bound on the l_1 norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most k dictionary elements, we provide a bound of the order O(sqrt(np log(m k)/m)) under an assumption on the level of orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results further yield fast rates of order 1/m as opposed to 1/sqrt(m) using localized Rademacher complexity. We provide similar results in a general setting using kernels with weak smoothness requirements.
Published: 2010
Full Text: View/download PDF

213. Robustness and Generalization

Author: Xu, Huan and Mannor, Shie
Subjects: Computer Science - Learning
Abstract: We derive generalization bounds for learning algorithms based on their robustness: the property that if a testing sample is "similar" to a training sample, then the testing error is close to the training error. This provides a novel approach, different from the complexity or stability arguments, to study generalization of learning algorithms. We further show that a weak notion of robustness is both sufficient and necessary for generalizability, which implies that robustness is a fundamental property for learning algorithms to work.
Published: 2010

214. Adaptive Bases for Reinforcement Learning

Author: Di Castro, Dotan and Mannor, Shie
Subjects: Computer Science - Learning, Computer Science - Artificial Intelligence
Abstract: We consider the problem of reinforcement learning using function approximation, where the approximating basis can change dynamically while interacting with the environment. A motivation for such an approach is maximizing the value function fitness to the problem faced. Three errors are considered: approximation square error, Bellman residual, and projected Bellman residual. Algorithms under the actor-critic framework are presented, and shown to converge. The advantage of such an adaptive basis is demonstrated in simulations.
Published: 2010

215. Learning from Multiple Outlooks

Author: Harel, Maayan and Mannor, Shie
Subjects: Computer Science - Learning
Abstract: We propose a novel problem formulation of learning a single task when the data are provided in different feature spaces. Each such space is called an outlook, and is assumed to contain both labeled and unlabeled data. The objective is to take advantage of the data from all the outlooks to better classify each of the outlooks. We devise an algorithm that computes optimal affine mappings from different outlooks to a target outlook by matching moments of the empirical distributions. We further derive a probabilistic interpretation of the resulting algorithm and a sample complexity bound indicating how many samples are needed to adequately find the mapping. We report the results of extensive experiments on activity recognition tasks that show the value of the proposed approach in boosting performance., Comment: with full proofs of theorems and all experiments
Published: 2010

216. Principal Component Analysis with Contaminated Data: The High Dimensional Case

Author: Xu, Huan, Caramanis, Constantine, and Mannor, Shie
Subjects: Statistics - Machine Learning, Computer Science - Learning, Statistics - Methodology
Abstract: We consider the dimensionality-reduction problem (finding a subspace approximation of observed data) for contaminated data in the high dimensional regime, where the number of observations is of the same magnitude as the number of variables of each observation, and the data set contains some (arbitrarily) corrupted observations. We propose a High-dimensional Robust Principal Component Analysis (HR-PCA) algorithm that is tractable, robust to contaminated points, and easily kernelizable. The resulting subspace has a bounded deviation from the desired one, achieves maximal robustness -- a breakdown point of 50% while all existing algorithms have a breakdown point of zero, and unlike ordinary PCA algorithms, achieves optimality in the limit case where the proportion of corrupted points goes to zero.
Published: 2010

217. A Geometric Proof of Calibration

Author: Mannor, Shie and Stoltz, Gilles
Subjects: Statistics - Machine Learning
Abstract: We provide yet another proof of the existence of calibrated forecasters; it has two merits. First, it is valid for an arbitrary finite number of outcomes. Second, it is short and simple and it follows from a direct application of Blackwell's approachability theorem to carefully chosen vector-valued payoff function and convex target set. Our proof captures the essence of existing proofs based on approachability (e.g., the proof by Foster, 1999 in case of binary outcomes) and highlights the intrinsic connection between approachability and calibration.
Published: 2009

218. Robust Regression and Lasso

Author: Xu, Huan, Caramanis, Constantine, and Mannor, Shie
Subjects: Computer Science - Information Theory, Computer Science - Learning
Abstract: Lasso, or $\ell^1$ regularized least squares, has been explored extensively for its remarkable sparsity properties. It is shown in this paper that the solution to Lasso, in addition to its sparsity, has robustness properties: it is the solution to a robust optimization problem. This has two important consequences. First, robustness provides a connection of the regularizer to a physical property, namely, protection from noise. This allows a principled selection of the regularizer, and in particular, generalizations of Lasso that also yield convex optimization problems are obtained by considering different uncertainty sets. Secondly, robustness can itself be used as an avenue to exploring different properties of the solution. In particular, it is shown that robustness of the solution explains why the solution is sparse. The analysis as well as the specific results obtained differ from standard sparsity results, providing different geometric intuition. Furthermore, it is shown that the robust optimization formulation is related to kernel density estimation, and based on this approach, a proof that Lasso is consistent is given using robustness directly. Finally, a theorem saying that sparsity and algorithmic stability contradict each other, and hence Lasso is not stable, is presented.
Published: 2008

219. Robustness and Regularization of Support Vector Machines

Author: Xu, Huan, Caramanis, Constantine, and Mannor, Shie
Subjects: Computer Science - Learning, Computer Science - Artificial Intelligence
Abstract: We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization, provides a robust optimization interpretation for the success of regularized SVMs. We use the this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well.
Published: 2008

220. Strategies for prediction under imperfect monitoring

Author: Lugosi, Gabor, Mannor, Shie, and Stoltz, Gilles
Subjects: Mathematics - Statistics Theory, Computer Science - Learning, 91A20, 62L12, 68Q32
Abstract: We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal. The proposed strategies are consistent in the sense that they achieve, asymptotically, the best possible average reward. It was Rustichini (1999) who first proved the existence of such consistent predictors. The forecasters presented here offer the first constructive proof of consistency. Moreover, the proposed algorithms are computationally efficient. We also establish upper bounds for the rates of convergence. In the case of deterministic feedback, these rates are optimal up to logarithmic terms., Comment: Journal version of a COLT conference paper
Published: 2007

221. k-Armed Bandit

Author: Mannor, Shie, Sammut, Claude, editor, and Webb, Geoffrey I., editor
Published: 2017
Full Text: View/download PDF

222. Non-parametric Online AUC Maximization

Author: Szörényi, Balázs, Cohen, Snir, Mannor, Shie, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Ceci, Michelangelo, editor, Hollmén, Jaakko, editor, Todorovski, Ljupčo, editor, Vens, Celine, editor, and Džeroski, Sašo, editor
Published: 2017
Full Text: View/download PDF

223. Efficiency Loss in a Network Resource Allocation Game: The Case of Elastic Supply

Author: Johari, Ramesh, Mannor, Shie, and Tsitsiklis, John N.
Subjects: Computer Science - Computer Science and Game Theory
Abstract: We consider a resource allocation problem where individual users wish to send data across a network to maximize their utility, and a cost is incurred at each link that depends on the total rate sent through the link. It is known that as long as users do not anticipate the effect of their actions on prices, a simple proportional pricing mechanism can maximize the sum of users' utilities minus the cost (called aggregate surplus). Continuing previous efforts to quantify the effects of selfish behavior in network pricing mechanisms, we consider the possibility that users anticipate the effect of their actions on link prices. Under the assumption that the links' marginal cost functions are convex, we establish existence of a Nash equilibrium. We show that the aggregate surplus at a Nash equilibrium is no worse than a factor of 4*sqrt{2} - 5 times the optimal aggregate surplus; thus, the efficiency loss when users are selfish is no more than approximately 34%., Comment: Originally Laboratory for Information and Decision Systems (MIT) Publication 2605
Published: 2005

224. CALM: Conditional Adversarial Latent Models for Directable Virtual Characters

Author: Tessler, Chen, primary, Kasten, Yoni, additional, Guo, Yunrong, additional, Mannor, Shie, additional, Chechik, Gal, additional, and Peng, Xue Bin, additional
Published: 2023
Full Text: View/download PDF

225. Planning and Learning with Adaptive Lookahead

Author: Rosenberg, Aviv, primary, Hallak, Assaf, additional, Mannor, Shie, additional, Chechik, Gal, additional, and Dalal, Gal, additional
Published: 2023
Full Text: View/download PDF

226. Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs

Author: Fuhrer, Benjamin, primary, Shpigelman, Yuval, additional, Tessler, Chen, additional, Mannor, Shie, additional, Chechik, Gal, additional, Zahavi, Eitan, additional, and Dalal, Gal, additional
Published: 2023
Full Text: View/download PDF

227. INSIGHT: Dynamic Traffic Management Using Heterogeneous Urban Data

Author: Panagiotou, Nikolaos, Zygouras, Nikolas, Katakis, Ioannis, Gunopulos, Dimitrios, Zacheilas, Nikos, Boutsis, Ioannis, Kalogeraki, Vana, Lynch, Stephen, O’Brien, Brendan, Kinane, Dermot, Mareček, Jakub, Yu, Jia Yuan, Verago, Rudi, Daly, Elizabeth, Piatkowski, Nico, Liebig, Thomas, Bockermann, Christian, Morik, Katharina, Schnitzler, Francois, Weidlich, Matthias, Gal, Avigdor, Mannor, Shie, Stange, Hendrik, Halft, Werner, Andrienko, Gennady, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Berendt, Bettina, editor, Bringmann, Björn, editor, Fromont, Élisa, editor, Garriga, Gemma, editor, Miettinen, Pauli, editor, Tatti, Nikolaj, editor, and Tresp, Volker, editor
Published: 2016
Full Text: View/download PDF

228. Reinforcement Learning in Robust Markov Decision Processes

Author: Lim, Shiau Hong, Xu, Huan, and Mannor, Shie
Published: 2016
Full Text: View/download PDF

229. Robust MDPs with k-Rectangular Uncertainty

Author: Mannor, Shie, Mebel, Ofir, and Xu, Huan
Published: 2016
Full Text: View/download PDF

230. A General Framework for Bandit Problems Beyond Cumulative Objectives.

Author: Cassel, Asaf, Mannor, Shie, and Zeevi, Assaf
Subjects: REINFORCEMENT learning, ROBBERS, SHARPE ratio, EUROPEAN communities, STATISTICAL decision making
Abstract: The stochastic multiarmed bandit (MAB) problem is a common model for sequential decision problems. In the standard setup, a decision maker has to choose at every instant between several competing arms; each of them provides a scalar random variable, referred to as a "reward." Nearly all research on this topic considers the total cumulative reward as the criterion of interest. This work focuses on other natural objectives that cannot be cast as a sum over rewards but rather, more involved functions of the reward stream. Unlike the case of cumulative criteria, in the problems we study here, the oracle policy, which knows the problem parameters a priori and is used to "center" the regret, is not trivial. We provide a systematic approach to such problems and derive general conditions under which the oracle policy is sufficiently tractable to facilitate the design of optimism-based (upper confidence bound) learning policies. These conditions elucidate an interesting interplay between the arm reward distributions and the performance metric. Our main findings are illustrated for several commonly used objectives, such as conditional value-at-risk, mean-variance trade-offs, Sharpe ratio, and more. Funding: This work was partially funded by the Israel Science Foundation [Contract 2199/20] and by the European Community's Seventh Framework Programme FP7/2007–2013 [Grant 306638 (Scaling Up Reinforcement Learning: Structure Learning, Skill Acquisition, and Reward Shaping)]. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

231. Data-Driven Methods for Markov Decision Problems with Parameter Uncertainty

Author: Mannor, Shie, primary and Xu, Huan, additional
Published: 2019
Full Text: View/download PDF

232. Statistical Optimization in High Dimensions

Author: Xu, Huan, Caramanis, Constantine, and Mannor, Shie
Published: 2016

233. Sub-sampling for Multi-armed Bandits

Author: Baransi, Akram, Maillard, Odalric-Ambrym, Mannor, Shie, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Siekmann, Jörg, Series editor, Calders, Toon, editor, Esposito, Floriana, editor, Hüllermeier, Eyke, editor, and Meo, Rosa, editor
Published: 2014
Full Text: View/download PDF

234. Heterogeneous Stream Processing and Crowdsourcing for Traffic Monitoring: Highlights

Author: Schnitzler, François, Artikis, Alexander, Weidlich, Matthias, Boutsis, Ioannis, Liebig, Thomas, Piatkowski, Nico, Bockermann, Christian, Morik, Katharina, Kalogeraki, Vana, Marecek, Jakub, Gal, Avigdor, Mannor, Shie, Kinane, Dermot, Gunopulos, Dimitrios, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Siekmann, Jörg, Series editor, Calders, Toon, editor, Esposito, Floriana, editor, Hüllermeier, Eyke, editor, and Meo, Rosa, editor
Published: 2014
Full Text: View/download PDF

235. Policy Gradient for s-Rectangular Robust Markov Decision Processes

Author: Kumar, Navdeep, Derman, Esther, Geist, Matthieu, Levy, Kfir, and Mannor, Shie
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)
Abstract: We present a novel robust policy gradient method (RPG) for s-rectangular robust Markov Decision Processes (MDPs). We are the first to derive the adversarial kernel in a closed form and demonstrate that it is a one-rank perturbation of the nominal kernel. This allows us to derive an RPG that is similar to the one used in non-robust MDPs, except with a robust Q-value function and an additional correction term. Both robust Q-values and correction terms are efficiently computable, thus the time complexity of our method matches that of non-robust MDPs, which is significantly faster compared to existing black box methods.
Published: 2023
Full Text: View/download PDF

236. Robust Reinforcement Learning via Adversarial Kernel Approximation

Author: Wang, Kaixin, Gadot, Uri, Kumar, Navdeep, Levy, Kfir, and Mannor, Shie
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)
Abstract: Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel. However, robust reinforcement learning (RL) approaches in RMDPs do not scale well to realistic online settings with high-dimensional domains. By characterizing the adversarial kernel in RMDPs, we propose a novel approach for online robust RL that approximates the adversarial kernel and uses a standard (non-robust) RL algorithm to learn a robust policy. Notably, our approach can be applied on top of any underlying RL algorithm, enabling easy scaling to high-dimensional domains. Experiments in classic control tasks, MinAtar and DeepMind Control Suite demonstrate the effectiveness and the applicability of our method.
Published: 2023
Full Text: View/download PDF

237. Oracle-Based Robust Optimization via Online Learning

Author: Ben-Tal, Aharon, Hazan, Elad, Koren, Tomer, and Mannor, Shie
Published: 2015

238. Opportunistic Approachability and Generalized No-Regret Problems

Author: Bernstein, Andrey, Mannor, Shie, and Shimkin, Nahum
Published: 2014

239. Bayesian Reinforcement Learning

Author: Vlassis, Nikos, Ghavamzadeh, Mohammad, Mannor, Shie, Poupart, Pascal, Wiering, Marco, editor, and van Otterlo, Martijn, editor
Published: 2012
Full Text: View/download PDF

240. Activity Recognition with Mobile Phones

Author: Frank, Jordan, Mannor, Shie, Precup, Doina, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Gunopulos, Dimitrios, editor, Hofmann, Thomas, editor, Malerba, Donato, editor, and Vazirgiannis, Michalis, editor
Published: 2011
Full Text: View/download PDF

241. Local Two-Stage Myopic Dynamics for Network Formation Games

Author: Arcaute, Esteban, Johari, Ramesh, Mannor, Shie, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Papadimitriou, Christos, editor, and Zhang, Shuzhong, editor
Published: 2008
Full Text: View/download PDF

242. Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

Author: Dyagilev, Kirill, Mannor, Shie, Shimkin, Nahum, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Girgin, Sertan, editor, Loth, Manuel, editor, Munos, Rémi, editor, Preux, Philippe, editor, and Ryabko, Daniil, editor
Published: 2008
Full Text: View/download PDF

243. Regularized Fitted Q-Iteration: Application to Planning

Author: Farahmand, Amir massoud, Ghavamzadeh, Mohammad, Szepesvári, Csaba, Mannor, Shie, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Girgin, Sertan, editor, Loth, Manuel, editor, Munos, Rémi, editor, Preux, Philippe, editor, and Ryabko, Daniil, editor
Published: 2008
Full Text: View/download PDF

244. Markov Decision Processes with Arbitrary Reward Processes

Author: Yu, Jia Yuan, Mannor, Shie, Shimkin, Nahum, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Girgin, Sertan, editor, Loth, Manuel, editor, Munos, Rémi, editor, Preux, Philippe, editor, and Ryabko, Daniil, editor
Published: 2008
Full Text: View/download PDF

245. Lenient Regret for Multi-Armed Bandits

Author: Merlis, Nadav and Mannor, Shie
Subjects: FOS: Computer and information sciences, Computer Science::Machine Learning, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), General Medicine, Machine Learning (cs.LG)
Abstract: We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results. For example, in large problems, or when the interaction with the environment is brief, finding an optimal arm is infeasible, and regret-minimizing algorithms tend to over-explore. To overcome this issue, algorithms for such settings should instead focus on playing near-optimal arms. To this end, we suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $\epsilon$. We then present a variant of the Thompson Sampling (TS) algorithm, called $\epsilon$-TS, and prove its asymptotic optimality in terms of the lenient regret. Importantly, we show that when the mean of the optimal arm is high enough, the lenient regret of $\epsilon$-TS is bounded by a constant. Finally, we show that $\epsilon$-TS can be applied to improve the performance when the agent knows a lower bound of the suboptimality gaps., Comment: Accepted to AAAI2021
Published: 2021

246. Reinforcement Learning with Trajectory Feedback

Author: Efroni, Yonathan, Merlis, Nadav, and Mannor, Shie
Subjects: FOS: Computer and information sciences, Computer Science::Machine Learning, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), General Medicine, Machine Learning (cs.LG)
Abstract: The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair. However, in practice, it is often the case that such frequent feedback is not available. In this work, we take a first step towards relaxing this assumption and require a weaker form of feedback, which we refer to as \emph{trajectory feedback}. Instead of observing the reward obtained after every action, we assume we only receive a score that represents the quality of the whole trajectory observed by the agent, namely, the sum of all rewards obtained over this trajectory. We extend reinforcement learning algorithms to this setting, based on least-squares estimation of the unknown reward, for both the known and unknown transition model cases, and study the performance of these algorithms by analyzing their regret. For cases where the transition model is unknown, we offer a hybrid optimistic-Thompson Sampling approach that results in a tractable algorithm., AAAI2021
Published: 2021

247. Non-parametric Online AUC Maximization

Author: Szörényi, Balázs, primary, Cohen, Snir, additional, and Mannor, Shie, additional
Published: 2017
Full Text: View/download PDF

248. Reinforcement Learning-Based Load Shared Sequential Routing

Author: Heidari, Fariba, Mannor, Shie, Mason, Lorne G., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Akyildiz, Ian F., editor, Sivakumar, Raghupathy, editor, Ekici, Eylem, editor, Oliveira, Jaudelice Cavalcante de, editor, and McNair, Janise, editor
Published: 2007
Full Text: View/download PDF

249. Network Formation: Bilateral Contracting and Myopic Dynamics

Author: Arcaute, Esteban, Johari, Ramesh, Mannor, Shie, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Deng, Xiaotie, editor, and Graham, Fan Chung, editor
Published: 2007
Full Text: View/download PDF

250. Online Learning with Variable Stage Duration

Author: Mannor, Shie, Shimkin, Nahum, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Lugosi, Gábor, editor, and Simon, Hans Ulrich, editor
Published: 2006
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

1,003 results on '"Mannor, Shie"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources