Deep Contextual Bandit and Reinforcement Learning for IRS-Assisted MU-MIMO Systems

Authors :: Pereira-Ruisánchez, Dariel
Fresnedo, Óscar
Pérez-Adán, Darian
Castedo, Luis
Source :: IEEE Transactions on Vehicular Technology, vol. 72, n.o 7, pp. 9099-9114, Jul 2023
Publication Year :: 2024
Abstract: The combination of multiple-input multiple-output (MIMO) systems and intelligent reflecting surfaces (IRSs) is foreseen as a critical enabler of beyond 5G (B5G) and 6G. In this work, two different approaches are considered for the joint optimization of the IRS phase-shift matrix and MIMO precoders of an IRS-assisted multi-stream (MS) multi-user MIMO (MU-MIMO) system. Both approaches aim to maximize the system sum-rate for every channel realization. The first proposed solution is a novel contextual bandit (CB) framework with continuous state and action spaces called deep contextual bandit-oriented deep deterministic policy gradient (DCB-DDPG). The second is an innovative deep reinforcement learning (DRL) formulation where the states, actions, and rewards are selected such that the Markov decision process (MDP) property of reinforcement learning (RL) is appropriately met. Both proposals perform remarkably better than state-of-the-art heuristic methods in scenarios with high multi-user interference.

Subjects :: Computer Science - Information Theory
Electrical Engineering and Systems Science - Signal Processing

Database :: arXiv
Journal :: IEEE Transactions on Vehicular Technology, vol. 72, n.o 7, pp. 9099-9114, Jul 2023
Publication Type :: Report
Accession number :: edsarx.2401.16901
Document Type :: Working Paper
Full Text :: https://doi.org/10.1109/TVT.2023.3249353

Full Text Access

Tools