RL-based Resource Allocation for Improving Throughput in Cellular D2D Communications

Document Type : Original Article


1 School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran,

2 Department of Computer Engineering, Yazd University, Yazd, Iran,


With increasing demand of bandwidth-intensive application in cellular networks, coexistence of Device-to-Device (D2D) communications with cellular subscribers is a promising solution for high spectrum efficiency and network throughput. In cellular D2D communications, intelligent resource sharing among the network subscribers and paired devices is of significant importance. The most state-of-the-art works are relied on the exact values of Channel State Information (CSI) and subscribers’ transmission rate feedback which are not available in the real cases. In this paper, we propose a novel reinforcement-learning-based approach for mode selection and spectrum allocation called RL-D2D which shares efficiently resources amongst the D2D users and cellular subscribers with the need for CSI, achieving high network throughput. The results of evaluations show that RL-D2D achieves near-optimal performance and low outage rate in despite of lack of CSI and users’ transmission rate feedback.   



[1]   M. Rebato,M. Mezzavilla,S. Rangan,M. Zorzi, "Resource sharing in 5G mmWave cellular networks," IEEE Conference on Computer Communications Workshops, 2016.
[2] M. Salahuddin, K. Alam, "Information and Communication Technology, electricity consumption and economic growth in OECD countries: A panel data analysis," International Journal of Electrical Power & Energy Systems, vol. 76, pp. 185-193, 2016.
[3]   S. Zhang , Y. Hou , X.  Xu , X.  Tao, "Resource allocation in D2D-based V2V communication for maximizing the number of concurrent transmissions," IEEE 27th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications, pp. 1-6, 2016.
[4]  J. Sachs, I. Maric , A. Goldsmith, "Cognitive Cellular Systems within the TV Spectrum," IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN), pp. 1-12, 2010.
[5]   V. Chandrasekhar,J.G.Andrew,Femtocell network: A Survey,IEEE comminucations Magazine, Vol 46, no.9.2008.
[6]  A. Abdelhadi , T. C. Clancy, "Optimal context-aware resource allocation in cellular networks," International Conference on Computing, Networking and Communications (ICNC), pp. 1-5, 2016.
 [7]   K.  Doppler , M.  Rinne, C. Wijting,C. B. Ribeiro, "Device-to-device communication as an underlay to LTE-advanced networks," IEEE Communications Magazine, vol. 47, no. 12, pp. 42-49, 2009.
[8]  R. Wang,D. Cheng,G. Zhang,Y. Lu,J. Yang,L. Zhao,K. Yang, "Joint relay selection and resource allocation in cooperative device-to-device communications," AEU - International Journal of Electronics and Communications, vol. 73, pp. 50-58, 2017.
[9]   S. Mallick,R. A. Loodaricheh ,K. N. R. Surya Vara PrasadVijay Bhargava, "Resource Allocation for Cooperative D2D Communication Networks," 5G Mobile Communications , pp. 531-570, 2016.
[10]  B. S. Thian,A. Goldsmith, "Decoding for MIMO Systems with Imperfect Channel State Information," IEEE Global Telecommunications Conference GLOBECOM, 2010.
[11] K.Ghavame,M.Naraghi, , "MIMO Detection With Imperfect Channel State Information Using Expectation Propagation," IEEE Transactions on Vehicular Technology, vol. 66, no. 9, 2017.
[12]   S. A. Ramprashad , G. Caire, "Cellular vs. Network MIMO: A comparison including the channel state information overhead," IEEE 20th International Symposium on Personal, Indoor and Mobile Radio Communications, 2009.
[13]   J.Han , Q. Cui ,C.Yang ,X.Tao , "Bipartite matching approach to optimal resource allocation in device to device underlaying cellular network, Electronics letters ," vol. 50, no. 3, pp. 212-214.,2014.
[14]   C. H. Yu,K. Doppler , C. B. Ribeiro ,O. Tirkkonen , "Resource Sharing Optimization for Device-to-Device Communication Underlaying Cellular Networks," IEEE Transactions on Wireless Communications, vol. 10, no. 8, pp. 2752 - 2763, 2011.
[15]   H. Min,J. Lee,S. Park,D. Hong, "Capacity Enhancement Using an Interference Limited Area for Device-to-Device Uplink Underlaying Cellular Networks," IEEE Transactions on Wireless Communications, vol. 10, no. 12, pp. 3995 - 4000, 2011.
[16]  Y. Yi, J. Zhang , Q. Zhang , T. Jiang , J. Zhang, "Cooperative Communication-Aware Spectrum Leasing in Cognitive Radio Networks," IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN), pp. 1-11, 2010.
[17]  Z. Liu,T. Peng, S. Xiang, W. Wang, "Mode selection for Device-to-Device communication under LTE-Advanced networks," IEEE International Conference on Communications, pp. 5563-5567, 2012.
[18]  Y. Pei , Y.C. Liang, "Resource Allocation for Device-to-Device Communications Overlaying Two-Way Cellular Networks," IEEE Transactions on Wireless Communications, vol. 12, no. 7, pp. 3611-3621, 2013.
[19] R. Chithra, R. Bestak,S. Patra, "Hungarian Method Based Joint Transmission Mode and Relay Selection in Device-to-Device Communication," 8th IFIP Wireless and Mobile Networking Conference (WMNC), pp. 261-268, 2015.
[20] علیرضا عبدالله پوری، گلاله عزیزی , «تخصیص منابع در شبکه‌های WiMax مبتنی بر OFDMA برای سیستم‌های IPTV با استفاده از الگویتم ژنتیک»، مجله مهندسی برق دانشگاه تبریز، جلد 46، شماره 3، ص. 267-276، 1395.
[21]  فرهاد دانائی یگانه، افشین ابراهیمی، «مدیریت انتخاب مجدد سلول در نسل‌های مختلف شبکه‌های سلولی مبتنی بر 3GPP و تحلیل دو چالش یک اپراتور داخلی»، مجله مهندسی برق دانشگاه تبریز، جلد 46، شماره 3، ص. 161-179، 1395.
[22]  Y. Cao, T. Jiang and C. Wang, "Cooperative device-to-device communications in cellular networks," IEEE Wireless Communications, Vol. 22, No. 3, pp. 124-129, 2015.
[23]   H. Shin, Y. Jin Sang, and J. G. Andrews., Outage probability for heterogeneous cellular networks with biased cell association, IEEE Global Telecommunications Conference GLOBECOM, pp. 1-5 , 2011.
[24]  L. Lei and Z. Zhong.,, Operator Controlled Device-to-Device Communications in LTE-Advanced Networks, IEEE Wireless Communications, Vol. 19, No. 3, pp. 96–104,2012.
[25]  N. Chen , H. Tian , Z. Wang, "Resource Allocation for Intra-Cluster D2D Communications Based on Kuhn-Munkres Algorithm," IEEE 80th Vehicular Technology Conference, pp. 1-5, 2014.
[26]   B. Zhou , H. Hu, S. Q. Huang, H. H. Chen, "Intracluster Device-to-Device Relay Algorithm With Optimal Resource Utilization," IEEETransactions on Vehicular Technology, vol. 62, no. 5, pp. 2315-2326, 2013.
[27]  S. Chen, T. Lin, I. King, M. R. Lyu, W. Chen, "Combinatorial Pure Exploration of Multi-Armed Bandits," Advances in Neural Information Processing Systems 27 (NIPS), pp. 379-387, 2014.
[28]  Y. Gai, B. Krishnamachari, R. Jain, "Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation," IEEE Symposium on New Frontiers in Dynamic Spectrum, pp. 1-6, 2010.
[29]   Y. Gai , B. Krishnamachari , M. Liu, "On the Combinatorial Multi-Armed Bandit Problem with Markovian Rewards," IEEE Global Telecommunications Conference , pp. 1-6, 2011.
[30]   Ontanón, S., The combinatorial multi-armed bandit problem and its application to real-time strategy games In Ninth Artificial Intelligence and Interactive Digital Entertainment Conference, 2013.
[31]  Y.Gai, B. Krishnamachari, and R. Jain, Combinatorial network  optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations,  IEEE/ACM Transactions on Networking, Vol. 20, No.5, pp. 1466-1478, 2012.
[32]   Garivier, A., & Moulines, E., On upper-confidence bound policies for switching bandit problems, In International Conference on Algorithmic Learning Theory, pp. 174-188, 2011.