Start Over

Adaptive network approach to exploration-exploitation trade-off in reinforcement learning.

Authors :: Moradi M
Zhai ZM
Panahi S
Lai YC
Source :: Chaos (Woodbury, N.Y.) [Chaos] 2024 Dec 01; Vol. 34 (12).
Publication Year :: 2024
Abstract: A foundational machine-learning architecture is reinforcement learning, where an outstanding problem is achieving an optimal balance between exploration and exploitation. Specifically, exploration enables the agents to discover optimal policies in unknown domains of the environment for gaining potentially large future rewards, while exploitation relies on the already acquired knowledge to maximize the immediate rewards. We articulate an approach to this problem, treating the dynamical process of reinforcement learning as a Markov decision process that can be modeled as a nondeterministic finite automaton and defining a subset of states in the automaton to represent the preference for exploring unknown domains of the environment. Exploration is prioritized by assigning higher transition probabilities to these states. We derive a mathematical framework to systematically balance exploration and exploitation by formulating it as a mixed integer programming (MIP) problem to optimize the agent's actions and maximize the discovery of novel preferential states. Solving the MIP problem provides a trade-off point between exploiting known states and exploring unexplored regions. We validate the framework computationally with a benchmark system and argue that the articulated automaton is effectively an adaptive network with a time-varying connection matrix, where the states in the automaton are nodes and the transitions among the states represent the edges. The network is adaptive because the transition probabilities evolve over time. The established connection between the adaptive automaton arising from reinforcement learning and the adaptive network opens the door to applying theories of complex dynamical networks to address frontier problems in machine learning and artificial intelligence.<br /> (© 2024 Author(s). Published under an exclusive license by AIP Publishing.)

Details

Language :: English
ISSN :: 1089-7682
Volume :: 34
Issue :: 12
Database :: MEDLINE
Journal :: Chaos (Woodbury, N.Y.)
Publication Type :: Academic Journal
Accession number :: 39625676
Full Text :: https://doi.org/10.1063/5.0221833

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Adaptive network approach to exploration-exploitation trade-off in reinforcement learning.

Abstract

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Adaptive network approach to exploration-exploitation trade-off in reinforcement learning.

Abstract

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources