Topic: reinforcement learning - Searchworks@Jio Institute Digital Library Search Results

Showing total 5 results

Start Over Topic reinforcement learning

5 results

1. An Introduction to Reinforcement Learning

Author: David Paper
Subjects: Intelligent agent, business.industry, Computer science, Order (business), media_common.quotation_subject, Reinforcement learning, Artificial intelligence, business, computer.software_genre, Function (engineering), computer, ComputingMilieux_MISCELLANEOUS, media_common
Abstract: Reinforcement learning (RL) is an area of machine learning that focuses on teaching intelligent agents how to take actions in an environment in order to maximize cumulative reward. Cumulative reward in RL is the sum of all rewards as a function of the number of training steps.
Published: 2021

2. State-of-the-Art Deep Learning Models in TensorFlow : Modern Machine Learning in the Google Colab Ecosystem

Author: David Paper and David Paper
Subjects: Machine learning, Reinforcement learning
Abstract: Use TensorFlow 2.x in the Google Colab ecosystem to create state-of-the-art deep learning models guided by hands-on examples. The Colab ecosystem provides a free cloud service with easy access to on-demand GPU (and TPU) hardware acceleration for fast execution of the models you learn to build. This book teaches you state-of-the-art deep learning models in an applied manner with the only requirement being an Internet connection. The Colab ecosystem provides everything else that you need, including Python, TensorFlow 2.x, GPU and TPU support, and Jupyter Notebooks. The book begins with an example-driven approach to building input pipelines that feed all machine learning models. You will learn how to provision a workspace on the Colab ecosystem to enable construction of effective input pipelines in a step-by-step manner. From there, you will progress into data augmentation techniques and TensorFlow datasets to gain a deeper understanding of how to work with complex datasets. You will find coverage of Tensor Processing Units (TPUs) and transfer learning followed by state-of-the-art deep learning models, including autoencoders, generative adversarial networks, fast style transfer, object detection, and reinforcement learning. Author Dr. Paper provides all the applied math, programming, and concepts you need to master the content. Examples range from relatively simple to very complex when necessary. Examples are carefully explained, concise, accurate, and complete. Care is taken to walk you through each topic through clear examples written in Python that you can try out and experiment with in the Google Colab ecosystem in the comfort of your own home or office. What You Will LearnTake advantage of the built-in support of the Google Colab ecosystemWork with TensorFlow data setsCreate input pipelines to feed state-of-the-art deep learning modelsCreate pipelined state-of-the-art deep learning models with clean and reliable Python codeLeverage pre-trained deep learning models to solve complex machine learning tasksCreate a simple environment to teach an intelligent agent to make automated decisionsWho This Book Is ForReaders who want to learn the highly popular TensorFlow deep learning platform, those who wish to master the basics of state-of-the-art deep learning models, and those looking to build competency with a modern cloud service tool such as Google Colab
Published: 2021

3. Adaptive Request Scheduling for the I/O Forwarding Layer using Reinforcement Learning

Author: Philippe O. A. Navaux, Jean Luca Bez, Toni Cortes, Francieli Zanon Boito, Alberto Miranda, Ramon Nou, Instituto de Informática da UFRGS (UFRGS), Universidade Federal do Rio Grande do Sul [Porto Alegre] (UFRGS), Topology-Aware System-Scale Data Management for High-Performance Computing (TADAAM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Data Aware Large Scale Computing (DATAMOVE ), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Barcelona Supercomputing Center - Centro Nacional de Supercomputacion (BSC - CNS), This study was financed in part by the Coordenac¸ao de Aperfeicoamento de Pessoal de Nıvel Superior - Brasil (CAPES) - Finance Code 001. It has also received support from the Conselho Nacional de Desenvolvimento Cientıfico e Tecnologico (CNPq), Brazil, the LICIA International Laboratory, NCSA-Inria-ANL-BSC-JSC-Riken Joint-Laboratory on Extreme Scale Computing (JLESC), the Spanish Ministry of Science and Innovation under the TIN2015–65316 grant, and the Generalitat de Catalunya under contract 2014– SGR–1051. This research received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 800144. Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www. grid5000.fr). The authors acknowledge the National Laboratory for Scientific Computing (LNCC/MCTI, Brazil) for providing HPC resources of the SDumont supercomputer, which have contributed to the research results reported within this paper., Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, and Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest
Subjects: I/O forwarding, Parallel I/O, I/O scheduling, Computer Networks and Communications, Computer science, Distributed computing, Auto-tuning, 02 engineering and technology, Scheduling (computing), I/O Scheduling, I/O Forwarding, Machine learning, Reinforcement learning, Aprenentatge automàtic, 0202 electrical engineering, electronic engineering, information engineering, Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC], Input/output, High performance I/O, 020206 networking & telecommunications, Workload, Reinforcement Learning, Hardware and Architecture, Superordinadors, 020201 artificial intelligence & image processing, High performance computing, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Software
Abstract: In this paper, we propose an approach to adapt the I/O forwarding layer of HPC systems to applications’ access patterns. I/O optimization techniques can improve performance for the access patterns they were designed to target, but they often decrease performance for others. Furthermore, these techniques usually depend on the precise tune of their parameters, which commonly falls back to the users. Instead, we propose to do it dynamically at runtime based on the I/O workload observed by the system. Our approach uses a reinforcement learning technique – contextual bandits – to make the system capable of learning the best parameter value to each observed access pattern during its execution. That eliminates the need of a complicated and time-consuming previous training phase. Our case study is the TWINS scheduling algorithm, where performance improvements depend on the time window parameter, which in turn depends on the workload. We evaluate our proposal and demonstrate it can reach a precision of 88% on the parameter selection in the first hundreds of observations of an access pattern, achieving 99% of the optimal performance. We demonstrate that the system – which is expected to live for years – will be able to adapt to changes and optimize its performance after having observed an access pattern for a few (not necessarily contiguous) minutes. This study was financed in part by the Coordenaçao de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. It has also received support from the Conselho Nacional de Desenvolvimento Científico e Tecnologico (CNPq), Brazil; the LICIA International Laboratory; NCSA-Inria-ANL-BSC-JSC-Riken Joint-Laboratory on Extreme Scale Computing (JLESC); the Spanish Ministry of Science and Innovation under the TIN2015–65316 grant; and the Generalitat de Catalunya under contract 2014– SGR–1051. This research received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 800144.
Published: 2020

4. READYS: A Reinforcement Learning Based Strategy for Heterogeneous Dynamic Scheduling

Author: Nathan Grinsztajn, Olivier Beaumont, Emmanuel Jeannot, Philippe Preux, Jeannot, Emmanuel, Scool (Scool), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Topology-Aware System-Scale Data Management for High-Performance Computing (TADAAM), Experiments presented in this paper were partially carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations. Nathan Grinsztajn is recipient of a PhD funding from AMX program, Ecole polytechnique., and Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest
Subjects: Schedule, Theoretical computer science, Job shop scheduling, Computer science, [INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Reinforcement learning, Dynamic priority scheduling, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Directed acyclic graph, Scheduling (computing), Cholesky decomposition, Task (project management)
Abstract: International audience; In this paper, we propose READYS, a reinforcement learning algorithm for the dynamic scheduling of computations modeled as a Directed Acyclic Graph (DAGs). Our goal is to develop a scheduling algorithm in which allocation and scheduling decisions are made at runtime, based on the state of the system, as performed in runtime systems such as StarPU or ParSEC. Reinforcement Learning is a natural candidate to achieve this task, since its general principle is to build step by step a strategy that, given the state of the system (the state of the resources and a view of the ready tasks and their successors in our case), makes a decision to optimize a global criterion. Moreover, the use of Reinforcement Learning is natural in a context where the duration of tasks (and communications) is stochastic. We propose READYS that combines Graph Convolutional Networks (GCN) with an Actor-Critic Algorithm (A2C): it builds an adaptive representation of the scheduling problem on the fly and learns a scheduling strategy, aiming at minimizing the makespan. A crucial point is that READYS builds a general scheduling strategy which is neither limited to only one specific application or task graph nor one particular problem size, and that can be used to schedule any DAG. We focus on different types of task graphs originating from linear algebra factorization kernels (CHOLESKY, LU, QR) and we consider heterogeneous platforms made of a few CPUs and GPUs. We first propose to analyze the performance of READYS when learning is performed on a given (platform, kernel, problem size) combination. Using simulations, we show that the scheduling agent obtains performances very similar or even superior to algorithms from the literature, and that it is especially powerful when the scheduling environment contains a lot of uncertainty. We additionally demonstrate that our agent exhibits very promising generalization capabilities. To the best of our knowledge, this is the first paper which shows that reinforcement learning can really be used for dynamic DAG scheduling on heterogeneous resources.
Published: 2021

5. Bootstrapping Q-Learning for Robotics from Neuro-Evolution Results

Author: Matthieu Zimmer, Stéphane Doncieux, Architectures et modèles d'Adptation et de la cognition (AMAC), Institut des Systèmes Intelligents et de Robotique (ISIR), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), This work has been supported by the FET project DREAM, that has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 640891. The authors would like to thank Olivier Sigaud for his comments on a first draft of the article. The data has been numerically analysed with the free software package GNU Octave and scikit-learn. Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by INRIA and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr)., and GRID5000
Subjects: Computer Science::Machine Learning, 0209 industrial biotechnology, Learning classifier system, business.industry, Computer science, Evolutionary robotics, Q-learning, Online machine learning, Multi-task learning, generation of representation during development, 02 engineering and technology, transfer learning, Robot learning, robots with development and learning skills, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], 020901 industrial engineering & automation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, 020201 artificial intelligence & image processing, Instance-based learning, Artificial intelligence, business, Software
Abstract: International audience; Reinforcement learning problems are hard to solve in a robotics context as classical algorithms rely on discrete representations of actions and states, but in robotics both are continuous. A discrete set of actions and states can be defined, but it requires an expertise that may not be available, in particular in open environments. It is proposed to define a process to make a robot build its own representation for a reinforcement learning algorithm. The principle is to first use a direct policy search in the sensori-motor space, i.e. with no predefined discrete sets of states nor actions, and then extract from the corresponding learning traces discrete actions and identify the relevant dimensions of the state to estimate the value function. Once this is done, the robot can apply reinforcement learning (1) to be more robust to new domains and, if required, (2) to learn faster than a direct policy search. This approach allows to take the best of both worlds: first learning in a continuous space to avoid the need of a specific representation, but at a price of a long learning process and a poor generalization, and then learning with an adapted representation to be faster and more robust.
Published: 2017

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results

1. An Introduction to Reinforcement Learning

2. State-of-the-Art Deep Learning Models in TensorFlow : Modern Machine Learning in the Google Colab Ecosystem

3. Adaptive Request Scheduling for the I/O Forwarding Layer using Reinforcement Learning

4. READYS: A Reinforcement Learning Based Strategy for Heterogeneous Dynamic Scheduling

5. Bootstrapping Q-Learning for Robotics from Neuro-Evolution Results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Database

Publisher

5 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources