Author: "Restelli, Marcello" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Restelli, Marcello"' showing total 359 results

Start Over Author "Restelli, Marcello"

359 results on '"Restelli, Marcello"'

51. Policy Optimization as Online Learning with Mediator Feedback

Author: Metelli, Alberto Maria, Papini, Matteo, D'Oro, Pierluca, and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space. The additional available information, compared to the standard bandit feedback, allows reusing samples generated by one policy to estimate the performance of other policies. Based on this observation, we propose an algorithm, RANDomized-exploration policy Optimization via Multiple Importance Sampling with Truncation (RANDOMIST), for regret minimization in PO, that employs a randomized exploration strategy, differently from the existing optimistic approaches. When the policy space is finite, we show that under certain circumstances, it is possible to achieve constant regret, while always enjoying logarithmic regret. We also derive problem-dependent regret lower bounds. Then, we extend RANDOMIST to compact policy spaces. Finally, we provide numerical simulations on finite and compact policy spaces, in comparison with PO and bandit baselines.
Published: 2020

52. Option Hedging with Risk Averse Reinforcement Learning

Author: Vittori, Edoardo, Trapletti, Michele, and Restelli, Marcello
Subjects: Quantitative Finance - Trading and Market Microstructure, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In this paper we show how risk-averse reinforcement learning can be used to hedge options. We apply a state-of-the-art risk-averse algorithm: Trust Region Volatility Optimization (TRVO) to a vanilla option hedging environment, considering realistic factors such as discrete time and transaction costs. Realism makes the problem twofold: the agent must both minimize volatility and contain transaction costs, these tasks usually being in competition. We use the algorithm to train a sheaf of agents each characterized by a different risk aversion, so to be able to span an efficient frontier on the volatility-p\&l space. The results show that the derived hedging strategy not only outperforms the Black \& Scholes delta hedge, but is also extremely robust and flexible, as it can efficiently hedge options with different characteristics and work on markets with different behaviors than what was used in training., Comment: Published to ICAIF2020
Published: 2020

53. An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

Author: Tirinzoni, Andrea, Pirotta, Matteo, Restelli, Marcello, and Lazaric, Alessandro
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In the contextual linear bandit setting, algorithms built on the optimism principle fail to exploit the structure of the problem and have been shown to be asymptotically suboptimal. In this paper, we follow recent approaches of deriving asymptotically optimal algorithms from problem-dependent regret lower bounds and we introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We build on a reformulation of the lower bound, where context distribution and exploration policy are decoupled, and we obtain an algorithm robust to unbalanced context distributions. Then, using an incremental primal-dual approach to solve the Lagrangian relaxation of the lower bound, we obtain a scalable and computationally efficient algorithm. Finally, we remove forced exploration and build on confidence intervals of the optimization problem to encourage a minimum level of exploration that is better adapted to the problem structure. We demonstrate the asymptotic optimality of our algorithm, while providing both problem-dependent and worst-case finite-time regret guarantees. Our bounds scale with the logarithm of the number of arms, thus avoiding the linear dependence common in all related prior works. Notably, we establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits. Finally, we verify that our algorithm obtains better empirical performance than state-of-the-art baselines., Comment: To appear at NeurIPS 2020. V2: clarified dependencies in the worst-case regret bound
Published: 2020

54. Inverse Reinforcement Learning from a Gradient-based Learner

Author: Ramponi, Giorgia, Drappo, Gianluca, and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Inverse Reinforcement Learning addresses the problem of inferring an expert's reward function from demonstrations. However, in many applications, we not only have access to the expert's near-optimal behavior, but we also observe part of her learning process. In this paper, we propose a new algorithm for this setting, in which the goal is to recover the reward function being optimized by an agent, given a sequence of policies produced during learning. Our approach is based on the assumption that the observed agent is updating her policy parameters along the gradient direction. Then we extend our method to deal with the more realistic scenario where we only have access to a dataset of learning trajectories. For both settings, we provide theoretical insights into our algorithms' performance. Finally, we evaluate the approach in a simulated GridWorld environment and on the MuJoCo environments, comparing it with the state-of-the-art baseline.
Published: 2020

55. Newton Optimization on Helmholtz Decomposition for Continuous Games

Author: Ramponi, Giorgia and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Many learning problems involve multiple agents optimizing different interactive functions. In these problems, the standard policy gradient algorithms fail due to the non-stationarity of the setting and the different interests of each agent. In fact, algorithms must take into account the complex dynamics of these systems to guarantee rapid convergence towards a (local) Nash equilibrium. In this paper, we propose NOHD (Newton Optimization on Helmholtz Decomposition), a Newton-like algorithm for multi-agent learning problems based on the decomposition of the dynamics of the system in its irrotational (Potential) and solenoidal (Hamiltonian) component. This method ensures quadratic convergence in purely irrotational systems and pure solenoidal systems. Furthermore, we show that NOHD is attracted to stable fixed points in general multi-agent systems and repelled by strict saddle ones. Finally, we empirically compare the NOHD's performance with that of state-of-the-art algorithms on some bimatrix games and in a continuous Gridworld environment., Comment: In 35th AAAI Conference on Artificial Intelligence (AAAI 2021)
Published: 2020

56. Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate

Author: Mutti, Mirco, Pratissoli, Lorenzo, and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy? In this paper, we argue that the entropy of the state distribution induced by finite-horizon trajectories is a sensible target. Especially, we present a novel and practical policy-search algorithm, Maximum Entropy POLicy optimization (MEPOL), to learn a policy that maximizes a non-parametric, $k$-nearest neighbors estimate of the state distribution entropy. In contrast to known methods, MEPOL is completely model-free as it requires neither to estimate the state distribution of any policy nor to model transition dynamics. Then, we empirically show that MEPOL allows learning a maximum-entropy exploration policy in high-dimensional, continuous-control domains, and how this policy facilitates learning a variety of meaningful reward-based tasks downstream., Comment: In 35th AAAI Conference on Artificial Intelligence (AAAI 2021)
Published: 2020

57. Sequential Transfer in Reinforcement Learning with a Generative Model

Author: Tirinzoni, Andrea, Poiani, Riccardo, and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We are interested in how to design reinforcement learning agents that provably reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones. The availability of solutions to related problems poses a fundamental trade-off: whether to seek policies that are expected to achieve high (yet sub-optimal) performance in the new task immediately or whether to seek information to quickly identify an optimal solution, potentially at the cost of poor initial behavior. In this work, we focus on the second objective when the agent has access to a generative model of state-action pairs. First, given a set of solved tasks containing an approximation of the target one, we design an algorithm that quickly identifies an accurate solution by seeking the state-action pairs that are most informative for this purpose. We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge. Then, we show how to learn these approximate tasks sequentially by reducing our transfer setting to a hidden Markov model and employing spectral methods to recover its parameters. Finally, we empirically verify our theoretical findings in simple simulated domains., Comment: ICML 2020
Published: 2020

58. Time-Variant Variational Transfer for Value Functions

Author: Canonaco, Giuseppe, Soprani, Andrea, Roveri, Manuel, and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In most of the transfer learning approaches to reinforcement learning (RL) the distribution over the tasks is assumed to be stationary. Therefore, the target and source tasks are i.i.d. samples of the same distribution. In the context of this work, we consider the problem of transferring value functions through a variational method when the distribution that generates the tasks is time-variant, proposing a solution that leverages this temporal structure inherent in the task generating process. Furthermore, by means of a finite-sample analysis, the previously mentioned solution is theoretically compared to its time-invariant version. Finally, we will provide an experimental evaluation of the proposed technique with three distinct temporal dynamics in three different RL environments.
Published: 2020

59. A Novel Confidence-Based Algorithm for Structured Bandits

Author: Tirinzoni, Andrea, Lazaric, Alessandro, and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem and rapidly discard all sub-optimal arms. In particular, unlike standard bandit algorithms with no structure, we show that the number of times a suboptimal arm is selected may actually be reduced thanks to the information collected by pulling other arms. Furthermore, we show that, in some structures, the regret of an anytime extension of our algorithm is uniformly bounded over time. For these constant-regret structures, we also derive a matching lower bound. Finally, we demonstrate numerically that our approach better exploits certain structures than existing methods., Comment: AISTATS 2020
Published: 2020

60. Online Joint Bid/Daily Budget Optimization of Internet Advertising Campaigns

Author: Nuara, Alessandro, Trovò, Francesco, Gatti, Nicola, and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Pay-per-click advertising includes various formats (\emph{e.g.}, search, contextual, social) with a total investment of more than 200 billion USD per year worldwide. An advertiser is given a daily budget to allocate over several, even thousands, campaigns, mainly distinguishing for the ad, target, or channel. Furthermore, publishers choose the ads to display and how to allocate them employing auctioning mechanisms, in which every day the advertisers set for each campaign a bid corresponding to the maximum amount of money per click they are willing to pay and the fraction of the daily budget to invest. In this paper, we study the problem of automating the online joint bid/daily budget optimization of pay-per-click advertising campaigns over multiple channels. We formulate our problem as a combinatorial semi-bandit problem, which requires solving a special case of the Multiple-Choice Knapsack problem every day. Furthermore, for every campaign, we capture the dependency of the number of clicks on the bid and daily budget by Gaussian Processes, thus requiring mild assumptions on the regularity of these functions. We design four algorithms and show that they suffer from a regret that is upper bounded with high probability as O(sqrt{T}), where T is the time horizon of the learning process. We experimentally evaluate our algorithms with synthetic settings generated from real data from Yahoo!, and we present the results of the adoption of our algorithms in a real-world application with a daily average spent of 1,000 Euros for more than one year.
Published: 2020

61. Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning

Author: Metelli, Alberto Maria, Mazzolini, Flavio, Bisi, Lorenzo, Sabbioni, Luca, and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: The choice of the control frequency of a system has a relevant impact on the ability of reinforcement learning algorithms to learn a highly performing policy. In this paper, we introduce the notion of action persistence that consists in the repetition of an action for a fixed number of decision steps, having the effect of modifying the control frequency. We start analyzing how action persistence affects the performance of the optimal policy, and then we present a novel algorithm, Persistent Fitted Q-Iteration (PFQI), that extends FQI, with the goal of learning the optimal value function at a given persistence. After having provided a theoretical study of PFQI and a heuristic approach to identify the optimal persistence, we present an experimental campaign on benchmark domains to show the advantages of action persistence and proving the effectiveness of our persistence selection method.
Published: 2020

62. MushroomRL: Simplifying Reinforcement Learning Research

Author: D'Eramo, Carlo, Tateo, Davide, Bonarini, Andrea, Restelli, Marcello, and Peters, Jan
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: MushroomRL is an open-source Python library developed to simplify the process of implementing and running Reinforcement Learning (RL) experiments. Compared to other available libraries, MushroomRL has been created with the purpose of providing a comprehensive and flexible framework to minimize the effort in implementing and testing novel RL methodologies. Indeed, the architecture of MushroomRL is built in such a way that every component of an RL problem is already provided, and most of the time users can only focus on the implementation of their own algorithms and experiments. The result is a library from which RL researchers can significantly benefit in the critical phase of the empirical analysis of their works. MushroomRL stable code, tutorials and documentation can be found at https://github.com/MushroomRL/mushroom-rl., Comment: Under revision to JMLR
Published: 2020

63. Risk-Averse Trust Region Optimization for Reward-Volatility Reduction

Author: Bisi, Lorenzo, Sabbioni, Luca, Vittori, Edoardo, Papini, Matteo, and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: In real-world decision-making problems, for instance in the fields of finance, robotics or autonomous driving, keeping uncertainty under control is as important as maximizing expected returns. Risk aversion has been addressed in the reinforcement learning literature through risk measures related to the variance of returns. However, in many cases, the risk is measured not only on a long-term perspective, but also on the step-wise rewards (e.g., in trading, to ensure the stability of the investment bank, it is essential to monitor the risk of portfolio positions on a daily basis). In this paper, we define a novel measure of risk, which we call reward volatility, consisting of the variance of the rewards under the state-occupancy measure. We show that the reward volatility bounds the return variance so that reducing the former also constrains the latter. We derive a policy gradient theorem with a new objective function that exploits the mean-volatility relationship, and develop an actor-only algorithm. Furthermore, thanks to the linearity of the Bellman equations defined under the new objective function, it is possible to adapt the well-known policy gradient algorithms with monotonic improvement guarantees such as TRPO in a risk-averse manner. Finally, we test the proposed approach in two simulated financial environments.
Published: 2019

64. ARLO: A framework for Automated Reinforcement Learning

Author: Mussi, Marco, Lombarda, Davide, Metelli, Alberto Maria, Trovó, Francesco, and Restelli, Marcello
Published: 2023
Full Text: View/download PDF

65. Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

Author: Sabbioni, Luca, primary, Corda, Francesco, additional, and Restelli, Marcello, additional
Published: 2023
Full Text: View/download PDF

66. Smoothing policies and safe policy gradients

Author: Papini, Matteo, Pirotta, Matteo, and Restelli, Marcello
Published: 2022
Full Text: View/download PDF

67. Policy Space Identification in Configurable Environments

Author: Metelli, Alberto Maria, Manneschi, Guglielmo, and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: We study the problem of identifying the policy space of a learning agent, having access to a set of demonstrations generated by its optimal policy. We introduce an approach based on statistical testing to identify the set of policy parameters the agent can control, within a larger parametric policy space. After presenting two identification rules (combinatorial and simplified), applicable under different assumptions on the policy space, we provide a probabilistic analysis of the simplified one in the case of linear policies belonging to the exponential family. To improve the performance of our identification rules, we frame the problem in the recently introduced framework of the Configurable Markov Decision Processes, exploiting the opportunity of configuring the environment to induce the agent revealing which parameters it can control. Finally, we provide an empirical evaluation, on both discrete and continuous domains, to prove the effectiveness of our identification rules.
Published: 2019

68. Gradient-Aware Model-based Policy Search

Author: D'Oro, Pierluca, Metelli, Alberto Maria, Tirinzoni, Andrea, Papini, Matteo, and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement. We leverage a weighting scheme, derived from the minimization of the error on the model-based policy gradient estimator, in order to define a suitable objective function that is optimized for learning the approximate transition model. Then, we integrate this procedure into a batch policy improvement algorithm, named Gradient-Aware Model-based Policy Search (GAMPS), which iteratively learns a transition model and uses it, together with the collected trajectories, to compute the new policy parameters. Finally, we empirically validate GAMPS on benchmark domains analyzing and discussing its properties.
Published: 2019
Full Text: View/download PDF

69. Feature Selection via Mutual Information: New Theoretical Insights

Author: Beraha, Mario, Metelli, Alberto Maria, Papini, Matteo, Tirinzoni, Andrea, and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Mutual information has been successfully adopted in filter feature-selection methods to assess both the relevancy of a subset of features in predicting the target variable and the redundancy with respect to other variables. However, existing algorithms are mostly heuristic and do not offer any guarantee on the proposed solution. In this paper, we provide novel theoretical results showing that conditional mutual information naturally arises when bounding the ideal regression/classification errors achieved by different subsets of features. Leveraging on these insights, we propose a novel stopping condition for backward and forward greedy methods which ensures that the ideal prediction error using the selected feature subset remains bounded by a user-specified threshold. We provide numerical simulations to support our theoretical claims and compare to common heuristic methods., Comment: Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 2019
Published: 2019

70. An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies

Author: Mutti, Mirco and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: What is a good exploration strategy for an agent that interacts with an environment in the absence of external rewards? Ideally, we would like to get a policy driving towards a uniform state-action visitation (highly exploring) in a minimum number of steps (fast mixing), in order to ease efficient learning of any goal-conditioned policy later on. Unfortunately, it is remarkably arduous to directly learn an optimal policy of this nature. In this paper, we propose a novel surrogate objective for learning highly exploring and fast mixing policies, which focuses on maximizing a lower bound to the entropy of the steady-state distribution induced by the policy. In particular, we introduce three novel lower bounds, that lead to as many optimization problems, that tradeoff the theoretical guarantees with computational complexity. Then, we present a model-based reinforcement learning algorithm, IDE$^{3}$AL, to learn an optimal policy according to the introduced objective. Finally, we provide an empirical evaluation of this algorithm on a set of hard-exploration tasks., Comment: In 34th AAAI Conference on Artificial Intelligence (AAAI 2020)
Published: 2019

71. Smoothing Policies and Safe Policy Gradients

Author: Papini, Matteo, Pirotta, Matteo, and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Policy Gradient (PG) algorithms are among the best candidates for the much-anticipated applications of reinforcement learning to real-world control tasks, such as robotics. However, the trial-and-error nature of these methods poses safety issues whenever the learning process itself must be performed on a physical system or involves any form of human-computer interaction. In this paper, we address a specific safety formulation, where both goals and dangers are encoded in a scalar reward signal and the learning agent is constrained to never worsen its performance, measured as the expected sum of rewards. By studying actor-only policy gradient from a stochastic optimization perspective, we establish improvement guarantees for a wide class of parametric policies, generalizing existing results on Gaussian policies. This, together with novel upper bounds on the variance of policy gradient estimators, allows us to identify meta-parameter schedules that guarantee monotonic improvement with high probability. The two key meta-parameters are the step size of the parameter updates and the batch size of the gradient estimates. Through a joint, adaptive selection of these meta-parameters, we obtain a policy gradient algorithm with monotonic improvement guarantees.
Published: 2019

72. Coherent Transport of Quantum States by Deep Reinforcement Learning

Author: Porotti, Riccardo, Tamascelli, Dario, Restelli, Marcello, and Prati, Enrico
Subjects: Quantum Physics
Abstract: Some problems in physics can be handled only after a suitable \textit{ansatz }solution has been guessed. Such method is therefore resilient to generalization, resulting of limited scope. The coherent transport by adiabatic passage of a quantum state through an array of semiconductor quantum dots provides a par excellence example of such approach, where it is necessary to introduce its so called counter-intuitive control gate ansatz pulse sequence. Instead, deep reinforcement learning technique has proven to be able to solve very complex sequential decision-making problems involving competition between short-term and long-term rewards, despite a lack of prior knowledge. We show that in the above problem deep reinforcement learning discovers control sequences outperforming the \textit{ansatz} counter-intuitive sequence. Even more interesting, it discovers novel strategies when realistic disturbances affect the ideal system, with better speed and fidelity when energy detuning between the ground states of quantum dots or dephasing are added to the master equation, also mitigating the effects of losses. This method enables online update of realistic systems as the policy convergence is boosted by exploiting the prior knowledge when available. Deep reinforcement learning proves effective to control dynamics of quantum states, and more generally it applies whenever an ansatz solution is unknown or insufficient to effectively treat the problem., Comment: 5 figures
Published: 2019
Full Text: View/download PDF

73. Risk-averse optimization of reward-based coherent risk measures

Author: Bonetti, Massimiliano, Bisi, Lorenzo, and Restelli, Marcello
Published: 2023
Full Text: View/download PDF

74. Policy space identification in configurable environments

Author: Metelli, Alberto Maria, Manneschi, Guglielmo, and Restelli, Marcello
Published: 2022
Full Text: View/download PDF

75. Policy Optimization via Importance Sampling

Author: Metelli, Alberto Maria, Papini, Matteo, Faccio, Francesco, and Restelli, Marcello
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating online and offline optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel, model-free, policy search algorithm, POIS, applicable in both action-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation; then we define a surrogate objective function, which is optimized offline whenever a new batch of trajectories is collected. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with state-of-the-art policy optimization methods.
Published: 2018

76. Stochastic Variance-Reduced Policy Gradient

Author: Papini, Matteo, Binaghi, Damiano, Canonaco, Giuseppe, Pirotta, Matteo, and Restelli, Marcello
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective func- tion; II) approximations in the full gradient com- putation; and III) a non-stationary sampling pro- cess. The result is SVRPG, a stochastic variance- reduced policy gradient algorithm that leverages on importance weights to preserve the unbiased- ness of the gradient estimate. Under standard as- sumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.
Published: 2018

77. Configurable Markov Decision Processes

Author: Metelli, Alberto Maria, Mutti, Mirco, and Restelli, Marcello
Subjects: Computer Science - Artificial Intelligence
Abstract: In many real-world problems, there is the possibility to configure, to a limited extent, some environmental parameters to improve the performance of a learning agent. In this paper, we propose a novel framework, Configurable Markov Decision Processes (Conf-MDPs), to model this new type of interaction with the environment. Furthermore, we provide a new learning algorithm, Safe Policy-Model Iteration (SPMI), to jointly and adaptively optimize the policy and the environment configuration. After having introduced our approach and derived some theoretical results, we present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy.
Published: 2018

78. Importance Weighted Transfer of Samples in Reinforcement Learning

Author: Tirinzoni, Andrea, Sessa, Andrea, Pirotta, Matteo, and Restelli, Marcello
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: We consider the transfer of experience samples (i.e., tuples < s, a, s', r >) in reinforcement learning (RL), collected from a set of source tasks to improve the learning process in a given target task. Most of the related approaches focus on selecting the most relevant source samples for solving the target task, but then all the transferred samples are used without considering anymore the discrepancies between the task models. In this paper, we propose a model-based technique that automatically estimates the relevance (importance weight) of each source sample for solving the target task. In the proposed approach, all the samples are transferred and used by a batch RL algorithm to solve the target task, but their contribution to the learning process is proportional to their importance weight. By extending the results for importance weighting provided in supervised learning literature, we develop a finite-sample analysis of the proposed batch RL algorithm. Furthermore, we empirically compare the proposed algorithm to state-of-the-art approaches, showing that it achieves better learning performance and is very robust to negative transfer, even when some source tasks are significantly different from the target task., Comment: Accepted at ICML 2018
Published: 2018

79. An online state of health estimation method for lithium-ion batteries based on time partitioning and data-driven model identification

Author: Mussi, Marco, Pellegrino, Luigi, Restelli, Marcello, and Trovò, Francesco
Published: 2022
Full Text: View/download PDF

80. Risk-averse policy optimization via risk-neutral policy optimization

Author: Bisi, Lorenzo, Santambrogio, Davide, Sandrelli, Federico, Tirinzoni, Andrea, Ziebart, Brian D., and Restelli, Marcello
Published: 2022
Full Text: View/download PDF

81. Cost-Sensitive Approach to Batch Size Adaptation for Gradient Descent

Author: Pirotta, Matteo and Restelli, Marcello
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: In this paper, we propose a novel approach to automatically determine the batch size in stochastic gradient descent methods. The choice of the batch size induces a trade-off between the accuracy of the gradient estimate and the cost in terms of samples of each update. We propose to determine the batch size by optimizing the ratio between a lower bound to a linear or quadratic Taylor approximation of the expected improvement and the number of samples used to estimate the gradient. The performance of the proposed approach is empirically compared with related methods on popular classification tasks. The work was presented at the NIPS workshop on Optimizing the Optimizers. Barcelona, Spain, 2016., Comment: Presented at the NIPS workshop on Optimizing the Optimizers. Barcelona, Spain, 2016
Published: 2017

82. Online joint bid/daily budget optimization of Internet advertising campaigns

Author: Nuara, Alessandro, Trovò, Francesco, Gatti, Nicola, and Restelli, Marcello
Published: 2022
Full Text: View/download PDF

83. Conservative Online Convex Optimization

Author: Bernasconi de Luca, Martino, Vittori, Edoardo, Trovò, Francesco, Restelli, Marcello, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Oliver, Nuria, editor, Pérez-Cruz, Fernando, editor, Kramer, Stefan, editor, Read, Jesse, editor, and Lozano, Jose A., editor
Published: 2021
Full Text: View/download PDF

84. Exploiting History Data for Nonstationary Multi-armed Bandit

Author: Re, Gerlando, Chiusano, Fabio, Trovò, Francesco, Carrera, Diego, Boracchi, Giacomo, Restelli, Marcello, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Oliver, Nuria, editor, Pérez-Cruz, Fernando, editor, Kramer, Stefan, editor, Read, Jesse, editor, and Lozano, Jose A., editor
Published: 2021
Full Text: View/download PDF

85. Unimodal Thompson Sampling for Graph-Structured Arms

Author: Paladino, Stefano, Trovò, Francesco, Restelli, Marcello, and Gatti, Nicola
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: We study, to the best of our knowledge, the first Bayesian algorithm for unimodal Multi-Armed Bandit (MAB) problems with graph structure. In this setting, each arm corresponds to a node of a graph and each edge provides a relationship, unknown to the learner, between two nodes in terms of expected reward. Furthermore, for any node of the graph there is a path leading to the unique node providing the maximum expected reward, along which the expected reward is monotonically increasing. Previous results on this setting describe the behavior of frequentist MAB algorithms. In our paper, we design a Thompson Sampling-based algorithm whose asymptotic pseudo-regret matches the lower bound for the considered setting. We show that -as it happens in a wide number of scenarios- Bayesian MAB algorithms dramatically outperform frequentist ones. In particular, we provide a thorough experimental evaluation of the performance of our and state-of-the-art algorithms as the properties of the graph vary.
Published: 2016

86. A voltage dynamic-based state of charge estimation method for batteries storage systems

Author: Mussi, Marco, Pellegrino, Luigi, Restelli, Marcello, and Trovò, Francesco
Published: 2021
Full Text: View/download PDF

87. Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

Author: Likmeta, Amarildo, Metelli, Alberto Maria, Ramponi, Giorgia, Tirinzoni, Andrea, Giuliani, Matteo, and Restelli, Marcello
Published: 2021
Full Text: View/download PDF

88. A practical guide to multi-objective reinforcement learning and planning

Author: Hayes, Conor F., Rădulescu, Roxana, Bargiacchi, Eugenio, Källström, Johan, Macfarlane, Matthew, Reymond, Mathieu, Verstraeten, Timothy, Zintgraf, Luisa M., Dazeley, Richard, Heintz, Fredrik, Howley, Enda, Irissappane, Athirai A., Mannion, Patrick, Nowé, Ann, Ramos, Gabriel, Restelli, Marcello, Vamplew, Peter, and Roijers, Diederik M.
Published: 2022
Full Text: View/download PDF

89. Multi-objective Reinforcement Learning with Continuous Pareto Frontier Approximation Supplementary Material

Author: Pirotta, Matteo, Parisi, Simone, and Restelli, Marcello
Subjects: Computer Science - Artificial Intelligence, Computer Science - Learning
Abstract: This document contains supplementary material for the paper "Multi-objective Reinforcement Learning with Continuous Pareto Frontier Approximation", published at the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15). The paper is about learning a continuous approximation of the Pareto frontier in Multi-Objective Markov Decision Problems (MOMDPs). We propose a policy-based approach that exploits gradient information to generate solutions close to the Pareto ones. Differently from previous policy-gradient multi-objective algorithms, where n optimization routines are use to have n solutions, our approach performs a single gradient-ascent run that at each step generates an improved continuous approximation of the Pareto frontier. The idea is to exploit a gradient-based approach to optimize the parameters of a function that defines a manifold in the policy parameter space so that the corresponding image in the objective space gets as close as possible to the Pareto frontier. Besides deriving how to compute and estimate such gradient, we will also discuss the non-trivial issue of defining a metric to assess the quality of the candidate Pareto frontiers. Finally, the properties of the proposed approach are empirically evaluated on two interesting MOMDPs., Comment: AAAI-15 Supplement. Updated upon acceptance at the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15)
Published: 2014

90. Conservative Online Convex Optimization

Author: Bernasconi de Luca, Martino, primary, Vittori, Edoardo, additional, Trovò, Francesco, additional, and Restelli, Marcello, additional
Published: 2021
Full Text: View/download PDF

91. Exploiting History Data for Nonstationary Multi-armed Bandit

Author: Re, Gerlando, primary, Chiusano, Fabio, additional, Trovò, Francesco, additional, Carrera, Diego, additional, Boracchi, Giacomo, additional, and Restelli, Marcello, additional
Published: 2021
Full Text: View/download PDF

92. Efficient evolutionary dynamics with extensive-form games

Author: Gatti, Nicola, Panozzo, Fabio, and Restelli, Marcello
Subjects: Computer Science - Computer Science and Game Theory
Abstract: Evolutionary game theory combines game theory and dynamical systems and is customarily adopted to describe evolutionary dynamics in multi-agent systems. In particular, it has been proven to be a successful tool to describe multi-agent learning dynamics. To the best of our knowledge, we provide in this paper the first replicator dynamics applicable to the sequence form of an extensive-form game, allowing an exponential reduction of time and space w.r.t. the currently adopted replicator dynamics for normal form. Furthermore, our replicator dynamics is realization equivalent to the standard replicator dynamics for normal form. We prove our results for both discrete-time and continuous-time cases. Finally, we extend standard tools to study the stability of a strategy profile to our replicator dynamics.
Published: 2013

93. Transfer from Multiple MDPs

Author: Lazaric, Alessandro and Restelli, Marcello
Subjects: Computer Science - Artificial Intelligence, Computer Science - Learning
Abstract: Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms. A simple and effective approach is to transfer samples from source tasks and include them into the training set used to solve a given target task. In this paper, we investigate the theoretical properties of this transfer method and we introduce novel algorithms adapting the transfer process on the basis of the similarity between source and target tasks. Finally, we report illustrative experimental results in a continuous chain problem., Comment: 2011
Published: 2011

94. Improving multi-armed bandit algorithms in online pricing settings

Author: Trovò, Francesco, Paladino, Stefano, Restelli, Marcello, and Gatti, Nicola
Published: 2018
Full Text: View/download PDF

95. Dynamic Pricing with Volume Discounts in Online Settings

Author: Mussi, Marco, primary, Genalti, Gianmarco, additional, Nuara, Alessandro, additional, Trovó, Francesco, additional, Restelli, Marcello, additional, and Gatti, Nicola, additional
Published: 2023
Full Text: View/download PDF

96. Tight Performance Guarantees of Imitator Policies with Continuous Actions

Author: Maran, Davide, primary, Metelli, Alberto Maria, additional, and Restelli, Marcello, additional
Published: 2023
Full Text: View/download PDF

97. Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization

Author: Mutti, Mirco, primary, De Santi, Riccardo, additional, Rossi, Emanuele, additional, Calderon, Juan Felipe, additional, Bronstein, Michael, additional, and Restelli, Marcello, additional
Published: 2023
Full Text: View/download PDF

98. Simultaneously Updating All Persistence Values in Reinforcement Learning

Author: Sabbioni, Luca, primary, Al Daire, Luca, additional, Bisi, Lorenzo, additional, Metelli, Alberto Maria, additional, and Restelli, Marcello, additional
Published: 2023
Full Text: View/download PDF

99. Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control

Author: Likmeta, Amarildo, primary, Sacco, Matteo, additional, Metelli, Alberto Maria, additional, and Restelli, Marcello, additional
Published: 2023
Full Text: View/download PDF

100. The EU-funded I3LUNG Project: Integrative Science, Intelligent Data Platform for Individualized LUNG Cancer Care With Immunotherapy

Author: Prelaj, Arsela, primary, Ganzinelli, Monica, additional, Trovo’, Francesco, additional, Roisman, Laila C., additional, Pedrocchi, Alessandra Laura Giulia, additional, Kosta, Sokol, additional, Restelli, Marcello, additional, Ambrosini, Emilia, additional, Broggini, Massimo, additional, Pravettoni, Gabriella, additional, Monzani, Dario, additional, Nuara, Alessandro, additional, Amat, Ramon, additional, Spathas, Nikos, additional, Willis, Michael, additional, Pearson, Alexander, additional, Dolezal, James, additional, Mazzeo, Laura, additional, Sangaletti, Sabina, additional, Correa, Ana Maria, additional, Aguaron, Alfonso, additional, Watermann, Iris, additional, Popa, Crina, additional, Raimondi, Giulia, additional, Triulzi, Tiziana, additional, Steurer, Stefan, additional, Lo Russo, Giuseppe, additional, Linardou, Helena, additional, Peled, Nir, additional, Felip, Enriqueta, additional, Reck, Martin, additional, and Garassino, Marina Chiara, additional
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

359 results on '"Restelli, Marcello"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources