Back to Search Start Over

Online Learning Schemes for Power Allocation in Energy Harvesting Communications.

Authors :
Sakulkar, Pranav
Krishnamachari, Bhaskar
Source :
IEEE Transactions on Information Theory; Jun2018, Vol. 64 Issue 6, p4610-4628, 19p
Publication Year :
2018

Abstract

We consider the problem of power allocation over one or more time-varying channels with unknown distributions in energy harvesting communications. In the single-channel case, the transmitter chooses the transmit power based on the amount of stored energy in its battery with the goal of maximizing the average rate over time. We model this problem as a Markov decision process (MDP) with transmitter as the agent, battery status as the state, transmits power as the action and rate as the reward. The average reward maximization problem can be modeled by a linear program (LP) that uses the transition probabilities for the state-action pairs and their reward values to select a power allocation policy. This problem is challenging because the uncertainty in channels implies that the mean rewards associated with the state-action pairs are unknown. We therefore propose two online learning algorithms: linear program of sample means (LPSM) and Epoch-LPSM that learn these rewards and adapt their policies over time. For both algorithms, we prove that their regret is upper-bounded by a constant. To our knowledge this is the first result showing constant regret learning algorithms for MDPs with unknown mean rewards. We also prove an even stronger result about LPSM: that its policy matches the optimal policy exactly in finite expected time. Epoch-LPSM incurs a higher regret compared with the LPSM, while reducing the computational requirements substantially. We further consider a multi-channel scenario, where the agent also chooses a channel in each slot, and present our multi-channel LPSM (MC-LPSM) algorithm that explores different channels and uses that information to solve the LP during exploitation. MC-LPSM incurs a regret that scales logarithmically in time and linearly in the number of channels. Through a matching lower bound on the regret of any algorithm, we also prove the asymptotic order optimality of MC-LPSM. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00189448
Volume :
64
Issue :
6
Database :
Complementary Index
Journal :
IEEE Transactions on Information Theory
Publication Type :
Academic Journal
Accession number :
129840933
Full Text :
https://doi.org/10.1109/TIT.2017.2773526