Wireless body area networks (WBANs) have attracted great attention from both industry and academia as a promising technology for continuous monitoring of physiological signals of the human body. As the sensors in WBANs are typically battery-driven and inconvenient to recharge, an energy efficient resource allocation scheme is essential to prolong the lifetime of the networks, while guaranteeing the rigid requirements of quality of service (QoS) of the WBANs in nature. As a possible alternative solution to address the energy efficiency problem, energy harvesting (EH) technology with the capability of harvesting energy from ambient sources can potentially reduce the dependence on the battery supply. Consequently, in this paper, we investigate the resource allocation problem for EH-powered WBANs (EH-WBANs). Our goal is to maximize the energy efficiency of the EH-WBANs with the joint consideration of transmission mode, relay selection, allocated time slot, transmission power, and the energy constraint of each sensor. In view of the characteristic of the EH-WBANs, we formulate the energy efficiency problem as a discrete-time and finite-state Markov decision process (DFMDP), in which allocation strategy decisions are made by a hub that does not have complete and global network information. Owing to the complexity of the problem, we propose a modified Q-learning (QL) algorithm to obtain the optimal allocation strategy. The numerical results validate the effectiveness of the proposed scheme as well as the low computation complexity of the proposed modified Q-learning (QL) algorithm. [ABSTRACT FROM AUTHOR]