Start Over

Control with adaptive Q-learning: A comparison for two classical control problems.

Authors :: Araújo, João Pedro
Figueiredo, Mário A.T.
Ayala Botto, Miguel
Source :: Engineering Applications of Artificial Intelligence. Jun2022, Vol. 112, pN.PAG-N.PAG. 1p.
Publication Year :: 2022
Abstract: This paper evaluates adaptive Q-learning (AQL) and single-partition adaptive Q-learning (SPAQL), two algorithms for efficient model-free episodic reinforcement learning (RL), in two classical control problems (Pendulum and CartPole). AQL adaptively partitions the state–action space of a Markov decision process (MDP), while learning the control policy, i.e. , the mapping from states to actions. The main difference between AQL and SPAQL is that the latter learns time-invariant policies, where the mapping from states to actions does not depend explicitly on the time step. This paper also proposes the SPAQL with terminal state (SPAQL-TS), an improved version of SPAQL tailored for the design of regulators for control problems. The time-invariant policies are shown to result in a better performance than the time-variant ones in both problems studied. These algorithms are particularly fitted to RL problems where the action space is finite, as is the case with the CartPole problem. SPAQL-TS solves the OpenAI GymCartPole problem, while also displaying a higher sample efficiency than trust region policy optimization (TRPO), a standard RL algorithm for solving control tasks. Moreover, the policies learned by SPAQL are interpretable, while TRPO policies are typically encoded as neural networks, and therefore hard to interpret. Yielding interpretable policies while being sample-efficient are the major advantages of SPAQL. The code for the experiments is available at https://github.com/jaraujo98/SinglePartitionAdaptiveQLearning. • Two recent Q-learning algorithms, AQL and SPAQL, are evaluated on two classical control benchmarks. • Based on insights from control theory, a new algorithm, SPAQL-TS, is introduced. • It is shown that both SPAQL and SPAQL-TS outperform TRPO in the Cartpole problem. [ABSTRACT FROM AUTHOR]

Subjects :: *ADAPTIVE control systems
*REINFORCEMENT learning
*MARKOV processes
*ALGORITHMS
*PENDULUMS

Details

Language :: English
ISSN :: 09521976
Volume :: 112
Database :: Academic Search Index
Journal :: Engineering Applications of Artificial Intelligence
Publication Type :: Academic Journal
Accession number :: 156811001
Full Text :: https://doi.org/10.1016/j.engappai.2022.104797

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Control with adaptive Q-learning: A comparison for two classical control problems.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Control with adaptive Q-learning: A comparison for two classical control problems.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources