Off-policy inverse Q-learning for discrete-time antagonistic unknown systems.

Authors :: Lian, Bosen
Xue, Wenqian
Xie, Yijing
Lewis, Frank L.
Davoudi, Ali
Source :: Automatica. Sep2023, Vol. 155, pN.PAG-N.PAG. 1p.
Publication Year :: 2023
Abstract: This paper proposes a data-driven model-free inverse reinforcement learning (RL) algorithm to reconstruct the unknown cost function of the demonstrated discrete-time (DT) dynamical systems with antagonistic disturbances. We propose an inverse RL policy iteration scheme that uses system dynamics and the input policies, for deriving our main result of a data-driven off-policy inverse Q-learning algorithm using only demonstrated trajectories of the antagonistic system without knowing system dynamics and the control policy gain. This data-driven algorithm consists of Q -function evaluation, state-penalty weight improvement, and action policies update. We guarantee unbiased estimates in the data-driven algorithm when exploration noises exist for the persistence of the excitation. An example verifies the proposed algorithm. [ABSTRACT FROM AUTHOR]