Descriptor: "policy search" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"policy search"' showing total 137 results

Start Over Descriptor "policy search"

137 results on '"policy search"'

1. Personalized Medicine with Multiple Treatments

Author: Wang, Wenjie, Zhang, Xuan, Fu, Haoda, Chen, Ding-Geng, Series Editor, and Zhao, Yichuan, editor
Published: 2024
Full Text: View/download PDF

2. Evolutionary Action Selection for Gradient-Based Policy Learning

Author: Ma, Yan, Liu, Tianxing, Wei, Bingsheng, Liu, Yi, Xu, Kang, Li, Wei, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Tanveer, Mohammad, editor, Agarwal, Sonali, editor, Ozawa, Seiichi, editor, Ekbal, Asif, editor, and Jatowt, Adam, editor
Published: 2023
Full Text: View/download PDF

3. An Extensive Application of Model Predictive Control Combined with Policy Search to Multi-agent Agile UAV Flight

Author: Xu, Huaxing, Yang, Chengwei, Li, Juan, Liu, Chang, Yang, Yu, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Yan, Liang, editor, and Deng, Yimin, editor
Published: 2023
Full Text: View/download PDF

4. Geometric Reinforcement Learning for Robotic Manipulation

Author: Naseem Alhousani, Matteo Saveriano, Ibrahim Sevinc, Talha Abdulkuddus, Hatice Kose, and Fares J. Abu-Dakka
Subjects: Learning on manifolds, policy optimization, policy search, geometric reinforcement learning, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Reinforcement learning (RL) is a popular technique that allows an agent to learn by trial and error while interacting with a dynamic environment. The traditional Reinforcement Learning (RL) approach has been successful in learning and predicting Euclidean robotic manipulation skills such as positions, velocities, and forces. However, in robotics, it is common to encounter non-Euclidean data such as orientation or stiffness, and failing to account for their geometric nature can negatively impact learning accuracy and performance. In this paper, to address this challenge, we propose a novel framework for RL that leverages Riemannian geometry, which we call Geometric Reinforcement Learning ( $\mathcal {G}$ -RL), to enable agents to learn robotic manipulation skills with non-Euclidean data. Specifically, $\mathcal {G}$ -RL utilizes the tangent space in two ways: a tangent space for parameterization and a local tangent space for mapping to a non-Euclidean manifold. The policy is learned in the parameterization tangent space, which remains constant throughout the training. The policy is then transferred to the local tangent space via parallel transport and projected onto the non-Euclidean manifold. The local tangent space changes over time to remain within the neighborhood of the current manifold point, reducing the approximation error. Therefore, by introducing a geometrically grounded pre- and post-processing step into the traditional RL pipeline, our $\mathcal {G}$ -RL framework enables several model-free algorithms designed for Euclidean space to learn from non-Euclidean data without modifications. Experimental results, obtained both in simulation and on a real robot, support our hypothesis that $\mathcal {G}$ -RL is more accurate and converges to a better solution than approximating non-Euclidean data.
Published: 2023
Full Text: View/download PDF

5. Designing Lookahead Policies for Sequential Decision Problems in Transportation and Logistics

Author: Warren B. Powell
Subjects: Direct lookahead approximations, model predictive control, parametric cost function approximation, policy search, reinforcement learning, sequential decisions, Transportation engineering, TA1001-1280, Transportation and communications, HE1-9990
Abstract: There is a wide range of sequential decision problems in transportation and logistics that require dealing with uncertainty. There are four classes of policies that we can draw on for different types of decisions, but many problems in transportation and logistics will ultimately require some form of direct lookahead policy (DLA) where we optimize decisions over some horizon to make a decision now. The most common strategy is to use a deterministic lookahead (think Google maps), but what if you want to handle uncertainty? In this paper, we identify two major strategies for designing practical, implementable lookahead policies which handle uncertainty in fundamentally different ways. The first is a suitably parameterized deterministic lookahead, where the parameterization is tuned in a stochastic simulator. The second uses an approximate stochastic lookahead, where we identify six classes of approximations, one of which involves designing a “policy-within-a-policy,” for which we turn to all four classes of policies. We claim that our approximate lookahead model spans all the classical stochastic optimization tools for lookahead policies, while opening up pathways for new policies. But we also insist that the idea of a parameterized deterministic lookahead is a powerful new idea that offers features that, for some problems, can outperform the more familiar stochastic lookahead policies.
Published: 2022
Full Text: View/download PDF

6. A Comparison of Policy Search in Joint Space and Cartesian Space for Refinement of Skills

Author: Fabisch, Alexander, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Berns, Karsten, editor, and Görges, Daniel, editor
Published: 2020
Full Text: View/download PDF

7. A real-world application of Markov chain Monte Carlo method for Bayesian trajectory control of a robotic manipulator.

Author: Tavakol Aghaei, Vahid, Ağababaoğlu, Arda, Yıldırım, Sinan, and Onat, Ahmet
Subjects: MARKOV chain Monte Carlo, ROBOTIC trajectory control, MARKOV processes, MONTE Carlo method, SPACE robotics, REINFORCEMENT learning, REWARD (Psychology)
Abstract: Reinforcement learning methods are being applied to control problems in robotics domain. These algorithms are well suited for dealing with the continuous large scale state spaces in robotics field. Even though policy search methods related to stochastic gradient optimization algorithms have become a successful candidate for coping with challenging robotics and control problems in recent years, they may become unstable when abrupt variations occur in gradient computations. Moreover, they may end up with a locally optimal solution. To avoid these disadvantages, a Markov chain Monte Carlo (MCMC) algorithm for policy learning under the RL configuration is proposed. The policy space is explored in a non-contiguous manner such that higher reward regions have a higher probability of being visited. The proposed algorithm is applied in a risk-sensitive setting where the reward structure is multiplicative. Our method has the advantages of being model-free and gradient-free, as well as being suitable for real-world implementation. The merits of the proposed algorithm are shown with experimental evaluations on a 2-Degree of Freedom robot arm. The experiments demonstrate that it can perform a thorough policy space search while maintaining adequate control performance and can learn a complex trajectory control task within a small finite number of iteration steps. • Policy search is formulated as a Bayesian framework where return is likelihood function. • Likelihood function is combined with an uninformative prior to establish a posterior. • MCMC is used to explore the high-reward regions of the policy space. • Parameterized control policy is learned by an MCMC-based RL algorithm. • The proposed method is a gradient-free and model-free algorithm. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

8. Policy manifold generation for multi-task multi-objective optimization of energy flexible machining systems.

Author: Qinge, Xiao, Niu, Ben, and Ying, Chen
Subjects: *MACHINING, *GAUSSIAN distribution, *LEAN management, *MAP design, *REINFORCEMENT learning, *MARKOV processes, *PROGRESSIVE collapse, *FLEXIBLE manufacturing systems
Abstract: Contemporary organizations recognize the importance of lean and green production to realize ecological and economic benefits. Compared with the existing optimization methods, the multi-task multi-objective reinforcement learning (MT-MORL) offers an attractive means to address the dynamic, multi-target process-optimization problems associated with Energy-Flexible Machining (EFM). Despite the recent advances in reinforcement learning, the realization of an accurate Pareto frontier representation remains a major challenge. This article presents a generative manifold-based policy-search method to approximate the continuously distributed Pareto frontier for EFM optimization. To this end, multi-pass operations are formulated as part of a multi-policy Markov decision process, wherein the machining configurations witness dynamic changes. However, the traditional Gaussian distribution cannot accurately fit complex upper-level policies. Thus, a multi-layered generator was designed to map the high-dimensional policy manifold from a simple Gaussian distribution without performing complex calculations. Additionally, a hybrid multi-task training approach is proposed to handle the mode collapse and large task difference observed during the improvement of the generalization performance. Extensive computational testing and comparisons against existing baseline methods have been performed to demonstrate the improved Pareto frontier quality and computational efficiency of the proposed algorithm. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

9. Chance constrained policy optimization for process control and optimization.

Author: Petsagkourakis, Panagiotis, Sandoval, Ilya Orson, Bradford, Eric, Galvanin, Federico, Zhang, Dongda, and Rio-Chanona, Ehecatl Antonio del
Subjects: *CONSTRAINED optimization, *PROCESS optimization, *REINFORCEMENT learning, *CHEMICAL process control, *CUMULATIVE distribution function, *MANUFACTURING processes, *MARKOV processes
Abstract: Chemical process optimization and control are affected by (1) plant-model mismatch, (2) process disturbances, and (3) constraints for safe operation. Reinforcement learning by policy optimization would be a natural way to solve this due to its ability to address stochasticity, plant-model mismatch, and directly account for the effect of future uncertainty and its feedback in a proper closed-loop manner; all without the need of an inner optimization loop. One of the main reasons why reinforcement learning has not been considered for industrial processes (or almost any engineering application) is that it lacks a framework to deal with safety critical constraints. Present algorithms for policy optimization use difficult-to-tune penalty parameters, fail to reliably satisfy state constraints or present guarantees only in expectation. We propose a chance constrained policy optimization (CCPO) algorithm which guarantees the satisfaction of joint chance constraints with a high probability — which is crucial for safety critical tasks. This is achieved by the introduction of constraint tightening (backoffs), which are computed simultaneously with the feedback policy. Backoffs are adjusted with Bayesian optimization using the empirical cumulative distribution function of the probabilistic constraints, and are therefore self-tuned. This results in a general methodology that can be imbued into present policy optimization algorithms to enable them to satisfy joint chance constraints with high probability. We present case studies that analyse the performance of the proposed approach. • Chance constrained policy optimization technique has been constructed. • Bayesian optimization and reinforcement learning have been combined to produce a safe policy for use. • Case studies with parametric and structural uncertainty have been considered. • Offline optimal policy that can be used online. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

10. Employing reinforcement learning to enhance particle swarm optimization methods.

Author: Wu, Di and Wang, G. Gary
Subjects: *PARTICLE swarm optimization, *REINFORCEMENT learning, *NUMERICAL functions, *PETRI nets, *GAUSSIAN distribution, *RANDOM numbers, *LEARNING strategies, *STOCHASTIC convergence
Abstract: Particle swarm optimization (PSO) is a well-known optimization algorithm that shows good performance in solving different optimization problems. However, PSO usually suffers from slow convergence. In this article, a reinforcement learning strategy is developed to enhance PSO in convergence by replacing the uniformly distributed random number in the updating function with a random number generated from a selected normal distribution. In the proposed method, the mean and standard deviation of the normal distribution are estimated from the current state of each individual through a policy net. The historic behaviour of the swarm group is used to update the policy net and guide the selection of parameters of the normal distribution. The proposed method is integrated into the original PSO and a state-of-the-art PSO, called the self-adaptive dynamic multi-swarm PSO (sDMS-PSO), and tested with numerical functions and engineering problems. The test results show that the convergence rate of PSO methods can be improved with the proposed reinforcement learning strategy. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

11. Learning Agents with Prioritization and Parameter Noise in Continuous State and Action Space

Author: Mangannavar, Rajesh, Srinivasaraghavan, Gopalakrishnan, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Lu, Huchuan, editor, Tang, Huajin, editor, and Wang, Zhanshan, editor
Published: 2019
Full Text: View/download PDF

12. Training and Evaluation of Deep Policies Using Reinforcement Learning and Generative Models.

Author: Ghadirzadeh, Ali, Poklukar, Petra, Arndt, Karol, Finn, Chelsea, Kyrki, Ville, Kragic, Danica, and Björkman, Märten
Subjects: *LATENT variables, *PHYSICAL training & conditioning
Abstract: We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable generative models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basketball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods. [ABSTRACT FROM AUTHOR]
Published: 2022

13. Variational policy search using sparse Gaussian process priors for learning multimodal optimal actions.

Author: Sasaki, Hikaru and Matsubara, Takamitsu
Subjects: *GAUSSIAN processes, *MULTIMODAL user interfaces, *KRIGING, *PRIOR learning, *REWARD (Psychology), *LEARNING, *REINFORCEMENT learning
Abstract: Policy search reinforcement learning has been drawing much attention as a method of learning a robot control policy. In particular, policy search using such non-parametric policies as Gaussian process regression can learn optimal actions with high-dimensional and redundant sensors as input. However, previous methods implicitly assume that the optimal action becomes unique for each state. This assumption can severely limit such practical applications as robot manipulations since designing a reward function that appears in only one optimal action for complex tasks is difficult. The previous methods might have caused critical performance deterioration because the typical non-parametric policies cannot capture the optimal actions due to their unimodality. We propose novel approaches in non-parametric policy searches with multiple optimal actions and offer two different algorithms commonly based on a sparse Gaussian process prior and variational Bayesian inference. The following are the key ideas: (1) multimodality for capturing multiple optimal actions and (2) mode-seeking for capturing one optimal action by ignoring the others. First, we propose a multimodal sparse Gaussian process policy search that uses multiple overlapped GPs as a prior. Second, we propose a mode-seeking sparse Gaussian process policy search that uses the student-t distribution for a likelihood function. The effectiveness of those algorithms is demonstrated through applications to object manipulation tasks with multiple optimal actions in simulations. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

14. Accelerating Robot Trajectory Learning for Stochastic Tasks

Author: Josip Vidakovic, Bojan Jerbic, Bojan Sekoranja, Marko Svaco, and Filip Suligoj
Subjects: Learning from demonstration, policy search, robot task learning, robot trajectory, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Learning from demonstration provides ways to transfer knowledge and skills from humans to robots. Models based solely on learning from demonstration often have very good generalization capabilities but are not completely accurate when adapting to new scenarios. This happens especially when learning stochastic tasks because of the correspondence problem and unmodeled physical properties of tasks. On the other hand, reinforcement learning (RL) methods such as policy search have the capability to refine an initial skill through exploration, where the learning process is often very dependent on the initialization strategy and is efficient in finding only local solutions. These two approaches are, therefore, frequently combined. In this paper, we present how the iterative learning of tasks can be accelerated by a learning from demonstration (LfD) method based on the extraction of via-points. The paper provides an evaluation of the approach on two different primitive motion tasks.
Published: 2020
Full Text: View/download PDF

15. Generalizing Regrasping with Supervised Policy Learning

Author: Chebotar, Yevgen, Hausman, Karol, Kroemer, Oliver, Sukhatme, Gaurav S., Schaal, Stefan, Siciliano, Bruno, Series editor, Khatib, Oussama, Series editor, Kulić, Dana, editor, Nakamura, Yoshihiko, editor, and Venture, Gentiane, editor
Published: 2017
Full Text: View/download PDF

16. A Linear Online Guided Policy Search Algorithm

Author: Sun, Biao, Xiong, Fangzhou, Liu, Zhiyong, Yang, Xu, Qiao, Hong, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Liu, Derong, editor, Xie, Shengli, editor, Li, Yuanqing, editor, Zhao, Dongbin, editor, and El-Alfy, El-Sayed M., editor
Published: 2017
Full Text: View/download PDF

17. Task Feasibility Maximization Using Model-Free Policy Search and Model-Based Whole-Body Control

Author: Ryan Lober, Olivier Sigaud, and Vincent Padois
Subjects: humanoids, reinforcement learning, policy Search, whole-body control, iCub humanoid robot, Mechanical engineering and machinery, TJ1-1570, Electronic computers. Computer science, QA75.5-76.95
Abstract: Producing feasible motions for highly redundant robots, such as humanoids, is a complicated and high-dimensional problem. Model-based whole-body control of such robots can generate complex dynamic behaviors through the simultaneous execution of multiple tasks. Unfortunately, tasks are generally planned without close consideration for the underlying controller being used, or the other tasks being executed, and are often infeasible when executed on the robot. Consequently, there is no guarantee that the motion will be accomplished. In this work, we develop a proof-of-concept optimization loop which automatically improves task feasibility using model-free policy search in conjunction with model-based whole-body control. This combination allows problems to be solved, which would be otherwise intractable using simply one or the other. Through experiments on both the simulated and real iCub humanoid robot, we show that by optimizing task feasibility, initially infeasible complex dynamic motions can be realized—specifically, a sit-to-stand transition. These experiments can be viewed in the accompanying Video S1.
Published: 2020
Full Text: View/download PDF

18. A Markov chain Monte Carlo algorithm for Bayesian policy search

Author: Vahid Tavakol Aghaei, Ahmet Onat, and Sinan Yıldırım
Subjects: Reinforcement learning, Markov chain Monte Carlo, particle filtering, risk sensitive reward, policy search, control, Control engineering systems. Automatic machinery (General), TJ212-225, Systems engineering, TA168
Abstract: Policy search algorithms have facilitated application of Reinforcement Learning (RL) to dynamic systems, such as control of robots. Many policy search algorithms are based on the policy gradient, and thus may suffer from slow convergence or local optima complications. In this paper, we take a Bayesian approach to policy search under RL paradigm, for the problem of controlling a discrete time Markov decision process with continuous state and action spaces and with a multiplicative reward structure. For this purpose, we assume a prior over policy parameters and aim for the ‘posterior’ distribution where the ‘likelihood’ is the expected reward. We propound a Markov chain Monte Carlo algorithm as a method of generating samples for policy parameters from this posterior. The proposed algorithm is compared with certain well-known policy gradient-based RL methods and exhibits more appropriate performance in terms of time response and convergence rate, when applied to a nonlinear model of a Cart-Pole benchmark.
Published: 2018
Full Text: View/download PDF

19. Neuroevolutionary diversity policy search for multi-objective reinforcement learning.

Author: Zhou, Dan, Du, Jiqing, and Arai, Sachiyo
Subjects: *DEEP reinforcement learning, *REINFORCEMENT learning, *DIVERSITY & inclusion policies, *PARETO optimum
Abstract: Sequential decision-making requires balancing multiple conflicting objectives through multi-objective reinforcement learning (MORL). Moreover, decision-makers desire dense solutions that satisfy their requirements and consider the trade-offs between different objectives (Pareto optimal solutions). Most deep reinforcement learning methods focus on single-objective problems or solve multi-objective problems using simple linear combinations, which may oversimplify the underlying problem and lead to suboptimal results. This study proposes a neuroevolutionary diversity policy search approach to address MORL problems. It employs neural networks, each equipped with a buffer for storing recent experiences, representing individuals in a population. The non-dominated sorting method and diversity distance metric are employed in the evolutionary process to select high-quality solutions as teachers. The teachers use gradient-based genetic operators to guide the population to produce high-quality offspring, thereby achieving dense Pareto optimal solutions. Furthermore, we introduce three MORL benchmarks with distinct characteristics: (1) a continuous deep sea treasure with convex and nonconvex Pareto fronts; (2) a multi-objective mountain car with sparse rewards and a discontinuous Pareto front; and (3) a multi-objective HalfCheetah with high-dimensional action-state spaces. The experimental results on the three MORL benchmarks demonstrate the superiority of the proposed algorithm in obtaining dense and high-quality Pareto optimal solutions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Assisted teleoperation in changing environments with a mixture of virtual guides.

Author: Ewerton, Marco, Arenz, Oleg, and Peters, Jan
Subjects: *VIRTUAL reality, *REMOTE control, *GAUSSIAN mixture models, *HAPTIC devices
Abstract: Haptic guidance is a powerful technique to combine the strengths of humans and autonomous systems for teleoperation. The autonomous system can provide haptic cues to enable the operator to perform precise movements; the operator can interfere with the plan of the autonomous system leveraging his/her superior cognitive capabilities. However, providing haptic cues such that the individual strengths are not impaired is challenging because low forces provide little guidance, whereas strong forces can hinder the operator in realizing his/her plan. Based on variational inference, we learn a Gaussian mixture model (GMM) over trajectories to accomplish a given task. The learned GMM is used to construct a potential field which determines the haptic cues. The potential field smoothly changes during teleoperation based on our updated belief over the plans and their respective phases. Furthermore, new plans are learned online when the operator does not follow any of the proposed plans or after changes in the environment. User studies confirm that our framework helps users perform teleoperation tasks more accurately than without haptic cues and, in some cases, faster. Moreover, we demonstrate the use of our framework to help a subject teleoperate a 7 DoF manipulator in a pick-and-place task. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

21. Bootstrap Aggregation and Cross‐Validation Methods to Reduce Overfitting in Reservoir Control Policy Search.

Author: Brodeur, Zachary P., Herman, Jonathan D., and Steinschneider, Scott
Subjects: RESERVOIRS, DROUGHT management, FLOOD risk, WATER supply, MACHINE learning, OVERHEAD costs, PALEOHYDROLOGY
Abstract: Policy search methods provide a heuristic mapping between observations and decisions and have been widely used in reservoir control studies. However, recent studies have observed a tendency for policy search methods to overfit to the hydrologic data used in training, particularly the sequence of flood and drought events. This technical note develops an extension of bootstrap aggregation (bagging) and cross‐validation techniques, inspired by the machine learning literature, to improve reservoir control policy performance on out‐of‐sample hydrological sequences. We explore these methods using a case study of Folsom Reservoir, California, using control policies structured as binary trees, and streamflow resampling based on the paleo‐inflow record. Results show that calibration‐validation strategies for policy selection coupled with certain ensemble aggregation methods can improve out‐of‐sample performance in water supply and flood risk objectives over baseline performance given fixed computational costs. Our findings highlight the potential to improve policy search methodologies by leveraging these well‐established model training strategies from machine learning. Key Points: We apply machine learning techniques of bootstrap aggregation (bagging) and cross‐validation to improve reservoir control policy searchBlock bootstrapping of historic hydrology based on paleo‐inflows can efficiently generate calibration‐validation‐testing dataPolicy selection according to validation performance on bootstrapped data leads to the greatest improvement in out‐of‐sample performance [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

22. Trust-Region Variational Inference with Gaussian Mixture Models.

Author: Arenz, Oleg, Mingjun Zhong, and Neumann, Gerhard
Subjects: *GAUSSIAN mixture models, *MARKOV chain Monte Carlo, *INTERFERENCE channels (Telecommunications)
Abstract: Many methods for machine learning rely on approximate inference from intractable probability distributions. Variational inference approximates such distributions by tractable models that can be subsequently used for approximate inference. Learning sufficiently accurate approximations requires a rich model family and careful exploration of the relevant modes of the target distribution. We propose a method for learning accurate GMM approximations of intractable probability distributions based on insights from policy search by using information-geometric trust regions for principled exploration. For efficient improvement of the GMM approximation, we derive a lower bound on the corresponding optimization objective enabling us to update the components independently. Our use of the lower bound ensures convergence to a stationary point of the original objective. The number of components is adapted online by adding new components in promising regions and by deleting components with negligible weight. We demonstrate on several domains that we can learn approximations of complex, multimodal distributions with a quality that is unmet by previous variational inference methods, and that the GMM approximation can be used for drawing samples that are on par with samples created by state-of-the- art MCMC samplers while requiring up to three orders of magnitude less computational resources. [ABSTRACT FROM AUTHOR]
Published: 2020

23. Numerical Quadrature for Probabilistic Policy Search.

Author: Vinogradska, Julia, Bischoff, Bastian, Achterhold, Jan, Koller, Torsten, and Peters, Jan
Subjects: *TRAJECTORY optimization, *CONTROL theory (Engineering), *BENCHMARK problems (Computer science), *REINFORCEMENT learning, *GAUSSIAN processes
Abstract: Learning control policies has become an appealing alternative to the derivation of control laws based on classic control theory. Model-based approaches have proven an outstanding data efficiency, especially when combined with probabilistic models to eliminate model bias. However, a major difficulty for these methods is that multi-step-ahead predictions typically become intractable for larger planning horizons and can only poorly be approximated. In this paper, we propose the use of numerical quadrature to overcome this drawback and provide significantly more accurate multi-step-ahead predictions. As a result, our approach increases data efficiency and enhances the quality of learned policies. Furthermore, policy learning is not restricted to optimizing locally around one trajectory, as numerical quadrature provides a principled approach to extend optimization to all trajectories starting in a specified starting state region. Thus, manual effort, such as choosing informative starting points for simultaneous policy optimization, is significantly decreased. Furthermore, learning is highly robust to the choice of initial policy and, thus, interaction time with the system is minimized. Empirical evaluations on simulated benchmark problems show the efficiency of the proposed approach and support our theoretical results. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

24. Reinforcement learning of motor skills using Policy Search and human corrective advice.

Author: Celemin, Carlos, Maeda, Guilherme, Ruiz-del-Solar, Javier, Peters, Jan, and Kober, Jens
Subjects: *REINFORCEMENT learning, *MOTOR learning, *INTERACTIVE learning, *LEARNING, *MACHINE learning, *MOTOR ability
Abstract: Robot learning problems are limited by physical constraints, which make learning successful policies for complex motor skills on real systems unfeasible. Some reinforcement learning methods, like Policy Search, offer stable convergence toward locally optimal solutions, whereas interactive machine learning or learning-from-demonstration methods allow fast transfer of human knowledge to the agents. However, most methods require expert demonstrations. In this work, we propose the use of human corrective advice in the actions domain for learning motor trajectories. Additionally, we combine this human feedback with reward functions in a Policy Search learning scheme. The use of both sources of information speeds up the learning process, since the intuitive knowledge of the human teacher can be easily transferred to the agent, while the Policy Search method with the cost/reward function take over for supervising the process and reducing the influence of occasional wrong human corrections. This interactive approach has been validated for learning movement primitives with simulated arms with several degrees of freedom in reaching via-point movements, and also using real robots in such tasks as "writing characters" and the ball-in-a-cup game. Compared with standard reinforcement learning without human advice, the results show that the proposed method not only converges to higher rewards when learning movement primitives, but also that the learning is sped up by a factor of 4–40 times, depending on the task. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

25. A covariance matrix adaptation evolution strategy in reproducing kernel Hilbert space.

Author: Dang, Viet-Hung, Vien, Ngo Anh, and Chung, TaeChoong
Abstract: The covariance matrix adaptation evolution strategy (CMA-ES) is an efficient derivative-free optimization algorithm. It optimizes a black-box objective function over a well-defined parameter space in which feature functions are often defined manually. Therefore, the performance of those techniques strongly depends on the quality of the chosen features or the underlying parametric function space. Hence, enabling CMA-ES to optimize on a more complex and general function class has long been desired. In this paper, we consider modeling the input spaces in black-box optimization non-parametrically in reproducing kernel Hilbert spaces (RKHS). This modeling leads to a functional optimisation problem whose domain is a RKHS function space that enables optimisation in a very rich function class. We propose CMA-ES-RKHS, a generalized CMA-ES framework that is able to carry out black-box functional optimisation in RKHS. A search distribution on non-parametric function spaces, represented as a Gaussian process, is adapted by updating both its mean function and covariance operator. Adaptive and sparse representation of the mean function and the covariance operator can be retained for efficient computation in the updates and evaluations of CMA-ES-RKHS by resorting to sparsification. We will also show how to apply our new black-box framework to search for an optimum policy in reinforcement learning in which policies are represented as functions in a RKHS. CMA-ES-RKHS is evaluated on two functional optimization problems and two bench-marking reinforcement learning domains. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

26. Distributed policy search reinforcement learning for job-shop scheduling tasks.

Author: Gabel, Thomas and Riedmiller, Martin
Subjects: PRODUCTION scheduling, MATHEMATICAL models, ADVANCED planning & scheduling, PRODUCTION control, PROCESS control systems, PRODUCTION planning, PRODUCTION management (Manufacturing)
Abstract: We interpret job-shop scheduling problems as sequential decision problems that are handled by independent learning agents. These agents act completely decoupled from one another and employ probabilistic dispatching policies for which we propose a compact representation using a small set of real-valued parameters. During ongoing learning, the agents adapt these parameters using policy gradient reinforcement learning, with the aim of improving the performance of the joint policy measured in terms of a standard scheduling objective function. Moreover, we suggest a lightweight communication mechanism that enhances the agents' capabilities beyond purely reactive job dispatching. We evaluate the effectiveness of our learning approach using various deterministic as well as stochastic job-shop scheduling benchmark problems, demonstrating that the utilisation of policy gradient methods can be effective and beneficial for scheduling problems. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

27. Recent advances in path integral control for trajectory optimization: An overview in theoretical and algorithmic perspectives.

Author: Kazim, Muhammad, Hong, JunGee, Kim, Min-Gyeom, and Kim, Kwang-Ki K.
Subjects: *TRAJECTORY optimization, *PATH integrals
Abstract: This paper presents a tutorial overview of path integral (PI) approaches for stochastic optimal control and trajectory optimization. We concisely summarize the theoretical development of path integral control to compute a solution for stochastic optimal control and provide algorithmic descriptions of the cross-entropy (CE) method, an open-loop controller using the receding horizon scheme known as the model predictive path integral (MPPI), and a parameterized state feedback controller based on the path integral control theory. We discuss policy search methods based on path integral control, efficient and stable sampling strategies, extensions to multi-agent decision-making, and MPPI for the trajectory optimization on manifolds. For tutorial demonstrations, some PI-based controllers are implemented in Python, MATLAB and ROS2/Gazebo simulations for trajectory optimization. The simulation frameworks and source codes are publicly available at the github page. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Solving an energy resource management problem with a novel multi-objective evolutionary reinforcement learning method.

Author: Leite, G.M.C., Jiménez-Fernández, S., Salcedo-Sanz, S., Marcelino, C.G., and Pedreira, C.E.
Abstract: Microgrids have become popular candidates for integrating diverse energy sources into the power grid as means of reducing fossil fuel usage. Energy Resource Management (ERM) is a type of Unit Commitment problem, where a player operates a microgrid with diverse renewable generators integrated with an external supplier. Calculating the economic dispatch of each committed unit on a planning horizon is an NP-hard problem, and therefore, finding an exact solution is difficult. This paper presents a multi-objective solution to the ERM problem from the perspective of battery operation and external supplier dispatch. First, a novel multi-objective decision problem modeling is proposed that considers three objectives: cost, greenhouse gas emissions, and battery degradation. This framework involves a learning agent that controls the depth of discharge of a Lithium-Ion battery. To address the proposed problem, a new multi-objective algorithm called Multi-Objective Evolutionary Policy Search (MEPS) is introduced. The proposed algorithm uses NeuroEvolution of Augmenting Topologies structure to evolve artificial neural networks for estimating action-preference values considering multi-objective rewards. The MEPS performance is evaluated on both standard and newly-proposed benchmark problems, using the hypervolume as the evaluation metric. When compared to standard deep reinforcement learning, results showed that MEPS provides cost-effective, environmentally friendly, and efficient energy storage management solutions. Furthermore, MEPS effectively solves the proposed ERM problem by finding neural networks with a small number of nodes and connections, which are suitable for use in embedded control systems. Overall, MEPS proved to be a promising multi-objective approach in the transition to clean energy resources. • An energy resource management problem modeling is proposed for microgrid systems. • A novel neuroevolutionary multi-objective reinforcement learning algorithm is proposed. • A novel benchmark environment for multi-objective reinforcement learning is proposed. • Evolved neural networks can perform multi-objective energy resource management. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

29. Discussion and Conclusion

Author: Barrett, Samuel, Kacprzyk, Janusz, Series editor, and Barrett, Samuel
Published: 2015
Full Text: View/download PDF

30. Learning Trajectory Distributions for Assisted Teleoperation and Path Planning

Author: Marco Ewerton, Oleg Arenz, Guilherme Maeda, Dorothea Koert, Zlatko Kolev, Masaki Takahashi, and Jan Peters
Subjects: assisted teleoperation, path planning, movement primitives, reinforcement learning, policy search, Gaussian processes, Mechanical engineering and machinery, TJ1-1570, Electronic computers. Computer science, QA75.5-76.95
Abstract: Several approaches have been proposed to assist humans in co-manipulation and teleoperation tasks given demonstrated trajectories. However, these approaches are not applicable when the demonstrations are suboptimal or when the generalization capabilities of the learned models cannot cope with the changes in the environment. Nevertheless, in real co-manipulation and teleoperation tasks, the original demonstrations will often be suboptimal and a learning system must be able to cope with new situations. This paper presents a reinforcement learning algorithm that can be applied to such problems. The proposed algorithm is initialized with a probability distribution of demonstrated trajectories and is based on the concept of relevance functions. We show in this paper how the relevance of trajectory parameters to optimization objectives is connected with the concept of Pearson correlation. First, we demonstrate the efficacy of our algorithm by addressing the assisted teleoperation of an object in a static virtual environment. Afterward, we extend this algorithm to deal with dynamic environments by utilizing Gaussian Process regression. The full framework is applied to make a point particle and a 7-DoF robot arm autonomously adapt their movements to changes in the environment as well as to assist the teleoperation of a 7-DoF robot arm in a dynamic environment.
Published: 2019
Full Text: View/download PDF

31. Multi-Ship Collision Avoidance Method based on Markov Decision Process

Author: Melhaoui, Yousra, Kamil, Abdelali, Mansouri, Khalifa, and Rachik, Mostafa
Subjects: colreg rules, grid-world, policy search, safe trajectory, value iteration, markov property, ship motion, decision-process, collision avoidance, autonomous navigation, optimization
Abstract: The maritime industry has been incorporating advanced technology to enhance mission planning and ensure safe navigation, including autonomous collision avoidance systems that follow International Regulations for Preventing Collisions at Sea (COLREG). Ongoing research in this field encompasses a wide range of approaches, from optimal control analysis to heuristic and metaheuristic methods, and solutions based on artificial intelligence. In this study, we propose an autonomous collision avoidance algorithm for ships based on Markov decision process. This work focuses on the development of a COLREG-compliant autonomous collision avoidance algorithm for ships using a Markov decision process. The algorithm considers the subject ship's position and aims to resolve potential collision conflicts with target ships while keeping the vessel on its initial trajectory, in compliance with regulations. The system is modeled as a Markov decision process using the ship's three coordinates position as states, actions generated from degrees-of-freedom, and constraints such as safe path, trip cost, and respect for rules to design the reward. The proposed policy search algorithm is implemented using python and its convergence and efficiency are tested through multiple scenarios.
Published: 2023

32. Policy Search for Path Integral Control

Author: Gómez, Vicenç, Kappen, Hilbert J., Peters, Jan, Neumann, Gerhard, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Siekmann, Jörg, Series editor, Calders, Toon, editor, Esposito, Floriana, editor, Hüllermeier, Eyke, editor, and Meo, Rosa, editor
Published: 2014
Full Text: View/download PDF

33. On Diversity, Teaming, and Hierarchical Policies: Observations from the Keepaway Soccer Task

Author: Kelly, Stephen, Heywood, Malcolm I., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Nicolau, Miguel, editor, Krawiec, Krzysztof, editor, Heywood, Malcolm I., editor, Castelli, Mauro, editor, García-Sánchez, Pablo, editor, Merelo, Juan J., editor, Rivas Santos, Victor M., editor, and Sim, Kevin, editor
Published: 2014
Full Text: View/download PDF

34. Safe Exploration Techniques for Reinforcement Learning – An Overview

Author: Pecka, Martin, Svoboda, Tomas, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, and Hodicky, Jan, editor
Published: 2014
Full Text: View/download PDF

35. Reinforcement Learning

Author: Du, Ke-Lin, Swamy, M. N. S., Du, Ke-Lin, and Swamy, M. N. S.
Published: 2014
Full Text: View/download PDF

36. Bayesian Optimization for Policy Search via Online-Offline Experimentation.

Author: Letham, Benjamin and Bakshy, Eytan
Subjects: *INTERACTIVE learning, *MACHINE learning, *GAUSSIAN processes, *INSTRUCTIONAL systems, *EXPERIMENTS, *GENERALIZATION
Abstract: Online field experiments are the gold-standard way of evaluating changes to real-world interactive machine learning systems. Yet our ability to explore complex, multi-dimensional policy spaces--such as those found in recommendation and ranking problems--is often constrained by the limited number of experiments that can be run simultaneously. To alleviate these constraints, we augment online experiments with an offline simulator and apply multi-task Bayesian optimization to tune live machine learning systems. We describe practical issues that arise in these types of applications, including biases that arise from using a simulator and assumptions for the multi-task kernel. We measure empirical learning curves which show substantial gains from including data from biased offline experiments, and show how these learning curves are consistent with theoretical results for multi-task Gaussian process generalization. We find that improved kernel inference is a significant driver of multi-task generalization. Finally, we show several examples of Bayesian optimization efficiently tuning a live machine learning system by combining offline and online experiments. [ABSTRACT FROM AUTHOR]
Published: 2019

37. Importance sampling policy gradient algorithms in reproducing kernel Hilbert space.

Author: Le, Tuyen Pham, Ngo, Vien Anh, Jaramillo, P. Marlith, and Chung, TaeChoong
Subjects: ALGORITHMS, FAMILY policy, MATHEMATICAL regularization, REINFORCEMENT learning
Abstract: Modeling policies in reproducing kernel Hilbert space (RKHS) offers a very flexible and powerful new family of policy gradient algorithms called RKHS policy gradient algorithms. They are designed to optimize over a space of very high or infinite dimensional policies. As a matter of fact, they are known to suffer from a large variance problem. This critical issue comes from the fact that updating the current policy is based on a functional gradient that does not exploit all old episodes sampled by previous policies. In this paper, we introduce a generalized RKHS policy gradient algorithm that integrates the following important ideas: (i) policy modeling in RKHS; (ii) normalized importance sampling, which helps reduce the estimation variance by reusing previously sampled episodes in a principled way; and (iii) regularization terms, which avoid updating the policy too over-fit to sampled data. In the experiment section, we provide an analysis of the proposed algorithms through bench-marking domains. The experiment results show that the proposed algorithm can still enjoy a powerful policy modeling in RKHS and achieve more data-efficiency. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

38. Compatible natural gradient policy search.

Author: Pajarinen, Joni, Thai, Hong Linh, Akrour, Riad, Peters, Jan, and Neumann, Gerhard
Subjects: REINFORCEMENT learning, ENTROPY (Information theory)
Abstract: Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to use KL-divergence to bound the region of trust resulting in a natural gradient policy update. We show that the natural gradient and trust region optimization are equivalent if we use the natural parameterization of a standard exponential policy distribution in combination with compatible value function approximation. Moreover, we show that standard natural gradient updates may reduce the entropy of the policy according to a wrong schedule leading to premature convergence. To control entropy reduction we introduce a new policy search method called compatible policy search (COPOS) which bounds entropy loss. The experimental results show that COPOS yields state-of-the-art results in challenging continuous control tasks and in discrete partially observable tasks. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

39. A fast hybrid reinforcement learning framework with human corrective feedback.

Author: Celemin, Carlos, Ruiz-del-Solar, Javier, and Kober, Jens
Subjects: BLENDED learning, LEARNING, REINFORCEMENT learning, INTERACTIVE learning, PSYCHOLOGICAL feedback, LEARNING strategies
Abstract: Reinforcement Learning agents can be supported by feedback from human teachers in the learning loop that guides the learning process. In this work we propose two hybrid strategies of Policy Search Reinforcement Learning and Interactive Machine Learning that benefit from both sources of information, the cost function and the human corrective feedback, for accelerating the convergence and improving the final performance of the learning process. Experiments with simulated and real systems of balancing tasks and a 3 DoF robot arm validate the advantages of the proposed learning strategies: (i) they speed up the convergence of the learning process between 3 and 30 times, saving considerable time during the agent adaptation, and (ii) they allow including non-expert feedback because they have low sensibility to erroneous human advice. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

40. Policy search in continuous action domains: An overview.

Author: Sigaud, Olivier and Stulp, Freek
Subjects: *MACHINE learning, *DEEP learning, *EVOLUTIONARY algorithms, *GOVERNMENT policy, *FAMILIES, *REINFORCEMENT learning
Abstract: Abstract Continuous action policy search is currently the focus of intensive research, driven both by the recent success of deep reinforcement learning algorithms and the emergence of competitors based on evolutionary algorithms. In this paper, we present a broad survey of policy search methods, providing a unified perspective on very different approaches, including also Bayesian Optimization and directed exploration methods. The main message of this overview is in the relationship between the families of methods, but we also outline some factors underlying sample efficiency properties of the various approaches. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

41. Directed Exploration in Black-Box Optimization for Multi-Objective Reinforcement Learning.

Author: García, Javier, Iglesias, Roberto, Rodríguez, Miguel A., and Regueiro, Carlos V.
Subjects: REINFORCEMENT learning, LEARNING, RESERVOIRS, SYSTEM safety
Abstract: Usually, real-world problems involve the optimization of multiple, possibly conflicting, objectives. These problems may be addressed by Multi-objective Reinforcement learning (MORL) techniques. MORL is a generalization of standard Reinforcement Learning (RL) where the single reward signal is extended to multiple signals, in particular, one for each objective. MORL is the process of learning policies that optimize multiple objectives simultaneously. In these problems, the use of directional/gradient information can be useful to guide the exploration to better and better behaviors. However, traditional policy-gradient approaches have two main drawbacks: they require the use of a batch of episodes to properly estimate the gradient information (reducing in this way the learning speed), and they use stochastic policies which could have a disastrous impact on the safety of the learning system. In this paper, we present a novel population-based MORL algorithm for problems in which the underlying objectives are reasonably smooth. It presents two main characteristics: fast computation of the gradient information for each objective through the use of neighboring solutions, and the use of this information to carry out a geometric partition of the search space and thus direct the exploration to promising areas. Finally, the algorithm is evaluated and compared to policy gradient MORL algorithms on different multi-objective problems: the water reservoir and the biped walking problem (the latter both on simulation and on a real robot). [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

42. Learning Throttle Valve Control Using Policy Search

Author: Bischoff, Bastian, Nguyen-Tuong, Duy, Koller, Torsten, Markert, Heiner, Knoll, Alois, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Blockeel, Hendrik, editor, Kersting, Kristian, editor, Nijssen, Siegfried, editor, and Železný, Filip, editor
Published: 2013
Full Text: View/download PDF

43. Assisting Movement Training and Execution With Visual and Haptic Feedback

Author: Marco Ewerton, David Rother, Jakob Weimar, Gerrit Kollegger, Josef Wiemeyer, Jan Peters, and Guilherme Maeda
Subjects: shared autonomy, HRI, movement primitives, reinforcement learning, policy search, cooperation, Neurosciences. Biological psychiatry. Neuropsychiatry, RC321-571
Abstract: In the practice of motor skills in general, errors in the execution of movements may go unnoticed when a human instructor is not available. In this case, a computer system or robotic device able to detect movement errors and propose corrections would be of great help. This paper addresses the problem of how to detect such execution errors and how to provide feedback to the human to correct his/her motor skill using a general, principled methodology based on imitation learning. The core idea is to compare the observed skill with a probabilistic model learned from expert demonstrations. The intensity of the feedback is regulated by the likelihood of the model given the observed skill. Based on demonstrations, our system can, for example, detect errors in the writing of characters with multiple strokes. Moreover, by using a haptic device, the Haption Virtuose 6D, we demonstrate a method to generate haptic feedback based on a distribution over trajectories, which could be used as an auxiliary means of communication between an instructor and an apprentice. Additionally, given a performance measurement, the haptic device can help the human discover and perform better movements to solve a given task. In this case, the human first tries a few times to solve the task without assistance. Our framework, in turn, uses a reinforcement learning algorithm to compute haptic feedback, which guides the human toward better solutions.
Published: 2018
Full Text: View/download PDF

44. Monte-Carlo Swarm Policy Search

Author: Fix, Jeremy, Geist, Matthieu, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Rutkowski, Leszek, editor, Korytkowski, Marcin, editor, Scherer, Rafał, editor, Tadeusiewicz, Ryszard, editor, Zadeh, Lotfi A., editor, and Zurada, Jacek M., editor
Published: 2012
Full Text: View/download PDF

45. Evolving Equilibrium Policies for a Multiagent Reinforcement Learning Problem with State Attractors

Author: Leon, Florin, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Jędrzejowicz, Piotr, editor, Nguyen, Ngoc Thanh, editor, and Hoang, Kiem, editor
Published: 2011
Full Text: View/download PDF

46. Conclusion

Author: Whiteson, Shimon, Kacprzyk, Janusz, editor, and Whiteson, Shimon
Published: 2010
Full Text: View/download PDF

47. Assisting Movement Training and Execution With Visual and Haptic Feedback.

Author: Ewerton, Marco, Rother, David, Weimar, Jakob, Kollegger, Gerrit, Wiemeyer, Josef, Peters, Jan, and Maeda, Guilherme
Subjects: MOTOR ability, IMITATIVE behavior, HAPTIC devices
Abstract: In the practice of motor skills in general, errors in the execution of movements may go unnoticed when a human instructor is not available. In this case, a computer system or robotic device able to detectmovement errors and propose corrections would be of great help. This paper addresses the problem of how to detect such execution errors and how to provide feedback to the human to correct his/her motor skill using a general, principled methodology based on imitation learning. The core idea is to compare the observed skill with a probabilistic model learned from expert demonstrations. The intensity of the feedback is regulated by the likelihood of the model given the observed skill. Based on demonstrations, our system can, for example, detect errors in the writing of characters with multiple strokes. Moreover, by using a haptic device, the Haption Virtuose 6D, we demonstrate a method to generate haptic feedback based on a distribution over trajectories, which could be used as an auxiliary means of communication between an instructor and an apprentice. Additionally, given a performance measurement, the haptic device can help the human discover and performbettermovements to solve a given task. In this case, the human first tries a few times to solve the task without assistance. Our framework, in turn, uses a reinforcement learning algorithm to compute haptic feedback, which guides the human toward better solutions. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

48. A Markov chain Monte Carlo algorithm for Bayesian policy search.

Author: Tavakol Aghaei, Vahid, Onat, Ahmet, and Yıldırım, Sinan
Subjects: SEARCH algorithms, REINFORCEMENT learning, DYNAMICAL systems, MARKOV processes, BAYESIAN analysis
Abstract: Policy search algorithms have facilitated application of Reinforcement Learning (RL) to dynamic systems, such as control of robots. Many policy search algorithms are based on the policy gradient, and thus may suffer from slow convergence or local optima complications. In this paper, we take a Bayesian approach to policy search under RL paradigm, for the problem of controlling a discrete time Markov decision process with continuous state and action spaces and with a multiplicative reward structure. For this purpose, we assume a prior over policy parameters and aim for the 'posterior' distribution where the 'likelihood' is the expected reward. We propound a Markov chain Monte Carlo algorithm as a method of generating samples for policy parameters from this posterior. The proposed algorithm is compared with certain well-known policy gradient-based RL methods and exhibits more appropriate performance in terms of time response and convergence rate, when applied to a nonlinear model of a Cart-Pole benchmark. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

49. Combining Policy Search with Planning in Multi-agent Cooperation

Author: Ma, Jie, Cameron, Stephen, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Iocchi, Luca, editor, Matsubara, Hitoshi, editor, Weitzenfeld, Alfredo, editor, and Zhou, Changjiu, editor
Published: 2009
Full Text: View/download PDF

50. A Survey of Preference-Based Reinforcement Learning Methods.

Author: Wirth, Christian, Akrour, Riad, Neumann, Gerhard, and FÜRNKRANZ, JOHANNES
Subjects: *REINFORCEMENT learning, *MACHINE learning, *REINFORCEMENT (Psychology), *REWARD (Psychology), *ALGORITHMS, *MATHEMATICAL optimization
Abstract: Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a suitably chosen reward function. However, designing such a reward function often requires a lot of task-specific prior knowledge. The designer needs to consider different objectives that do not only influence the learned behavior but also the learning progress. To alleviate these issues, preference-based reinforcement learning algorithms (PbRL) have been proposed that can directly learn from an expert's preferences instead of a hand-designed numeric reward. PbRL has gained traction in recent years due to its ability to resolve the reward shaping problem, its ability to learn from non numeric rewards and the possibility to reduce the dependence on expert knowledge. We provide a unified framework for PbRL that describes the task formally and points out the different design principles that affect the evaluation task for the human as well as the computational complexity. The design principles include the type of feedback that is assumed, the representation that is learned to capture the preferences, the optimization problem that has to be solved as well as how the exploration/exploitation problem is tackled. Furthermore, we point out shortcomings of current algorithms, propose open research questions and briefly survey practical tasks that have been solved using PbRL. [ABSTRACT FROM AUTHOR]
Published: 2017

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

137 results on '"policy search"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources