Employing Chaos Theory for Exploration–Exploitation Balance in Deep Reinforcement Learning

Document Type : Original Article

Authors

1 Department of Computer Engineering, Yazd University, Yazd, Iran

2 Department of Computer Engineering, Yazd University, Yazd, Iran.

Abstract

Deep reinforcement learning is widely used in machine learning problems and the use of methods to improve its performance is important. Balance between exploration and exploitation is one of the important issues in reinforcement learning and for this purpose, action selection methods that involve exploration such as ɛ-greedy and Soft-max are used. In these methods, by generating random numbers and evaluating the action-value, an action is selected that can maintain this balance. Over time, with appropriate exploration, it can be expected that the environment becomes better understood and more valuable actions are identified. Chaos, with features such as high sensitivity to initial conditions, non-periodicity, unpredictability, exploration of all possible search space states, and pseudo-random behavior, has many applications. In this article, numbers generated by chaotic systems are used for the ɛ-greedy action selection method in deep reinforcement learning to improve the balance between exploration and exploitation; in addition, the impact of using chaos in replay buffer will also be investigated. Experiments conducted in the Lunar Lander environment demonstrate a significant increase in learning speed and higher rewards in this environment

Keywords

Main Subjects


[1] ولی درهمی، فریناز اعلمیان، محمدباقر دولتشاهی، «یادگیری تقویتی»، انتشارات دانشگاه یزد، 1396.
[2] سید علی خوشرو، سید حسین خواسته، «افزایش سرعت فرآیند یادگیری  DQN با مکانیزم آثار شایستگی»، مجله کنترل، جلد 14، شماره 4، صفحات 23-13، 1399.
[3] P. Ladosz, L. Weng, M. Kim, H. Oh, “Exploration in deep reinforcement learning: A survey”, Information Fusion,  vol. 85, pp. 1-22, 2022.
[4] H. Khodadadi, A. Zandvakili, “A New Method for Encryption of Color Images based on    Combination of Chaotic Systems”, Journal of AI and Data Mining, vol. 7, no. 3, pp. 377-383, 2019.
[5] R.B. Naik, U. Singh, “A review on applications of chaotic maps in pseudo-random number generators and encryption”, Annals of Data Science, vol. 11,  no. 1, pp. 25-50, 2024.
[6] H. Liu, A. Kadir, Y. Li, “Audio encryption scheme by confusion and diffusion based on multi-scroll chaotic system and one-time keys”, Optik, vol. 127, no. 19, pp. 7431-7438, 2016.
[7] M.S Azzaz, M.A. Krimil,  “A new chaos-based text encryption to secure gps data”, In 2018 International Conference on Smart Communications in Network Technologies (SaCoNeT), October 2018, Algiers, Algeria , pp. 294-299.
[8] H. Xu, X. Tong, X. Meng, “An efficient chaos pseudo-random number generator applied to video  encryption”, Optik, vol. 127, no. 20, pp. 9305-9319, 2016.
[9] K. Chen, B.  Xue,  M. Zhang,  F. Zhou, “Novel chaotic grouping particle swarm optimization with  a dynamic regrouping strategy for solving numerical optimization tasks”, Knowledge-Based Systems, 194, 105-123, 2020.
[10] مجید محمدپور، حمید پروین، «الگوریتم ژنتیک آشوب گونه مبتنی بر حافظه و خوشه بندی برای حل مسائل بهینه سازی پویا»، مجله مهندسی برق دانشگاه تبریز، جلد 46، شماره 3، صفحات 299-318، 1395.
[11] صمد نجاتیان، وحیده رضایی، حمید پروین، «ارائه یک الگوریتم چندجمعیتی مبتنی بر ازدحام ذرات برای حل مسائل بهینه‌سازی پویا»، مجله مهندسی برق دانشگاه تبریز، جلد 48، شماره 3، صفحات 1405-1423، 1397.
[12] Z. Liang, Z. Xiao, J. Wang, L. Sun, B. Li,  Y. Hu, Y. Wu, “An improved chaos similarity model for hydrological forecasting”, Journal of Hydrology, vol. 577, pp. 123-133, 2019.
[13] Z. Hua, Y. Zhou, “Exponential chaotic model for generating robust chaos”, IEEE transactions on systems, man, and cybernetics: systems, vol. 51, no. 6, pp. 3713-3724, 2019.
[14] J.T. Chien, P.C. Hsu, “Stochastic curiosity maximizing exploration”, In 2020 International Joint Conference on Neural Networks (IJCNN), July 2020, Glasgow, UK, pp. 1-8.
[15] T. Lin, A. Jabri, “MIMEx: intrinsic rewards from masked input modeling”, arXiv preprint arXiv:2305.08932, 2023.
[16] V. Derhami, V.J.  Majd, M.N. Ahmadabadi, “Exploration and exploitation balance management in fuzzy reinforcement learning”, Fuzzy sets and systems, vol.  161, no. 4, pp. 578-595, 2010.
[17] B.H. Abed-alguni,  “Action-selection method for reinforcement learning based on cuckoo search algorithm”, Arabian Journal for Science and Engineering, vol. 43, no. 12, pp. 6771-6785, 2018.
[18] A. Ecoffet, J. Huizinga, J. Lehman, K.O. Stanley, J. Clune, “First return, then explore”, Nature, vol. 590, no. 7847, pp. 580-586, 2021.
[19] M. Usama, D.E. Chang, “Learning-driven exploration for reinforcement learning”, In 2021 21st International Conference on Control, Automation and Systems (ICCAS ), (2021, October).) (pp. 1146-1151). IEEE.
[20] G. Dalal, K.   Dvijotham, M. Vecerik, T. Hester, C. Paduraru, and Y. Tassa, “Safe exploration in continuous action spaces”, arXiv preprint arXiv:1801.08757, 2018.
[21] T.G. Karimpanal,  S. Rana, S. Gupta, T. Tran,  S. Venkatesh, “Learning transferable domain priors for safe exploration in reinforcement learning", In 2020 International Joint Conference on Neural Networks (IJCNN),July 2020, Glasgow, UK, pp. 1-10.
[22] A. Zhaikhan, A.H. Sayed, “Graph Exploration for Effective Multiagent Q-Learning”, IEEE Transactions on Neural Networks and Learning Systems, pp. 1-12, 2024.
[23] K. Morihiro, T. Isokawa, N. Matsui, H. Nishimura, “Effects of chaotic exploration on reinforcement learning in target capturing task”, International Journal of Knowledge-based and Intelligent Engineering Systems, vol. 12, no. 5-6, pp.369-377, 2008.
[24] K. Morihiro, T. Isokawa, N. Matsui, H. Nishimura, “Reinforcement learning by chaotic exploration generator in target capturing task”, In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, 2005, Melbourne, Australia, pp. 1248-1254.
[25] K. Morihiro, N. Matsui, H.  Nishimura, “Effects of chaotic exploration on reinforcement maze learning”, In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems,  (pp. 833-839). Springer, Berlin, Heidelberg.
[26] K. Morihiro, N. Matsui, H. Nishimura,  “Chaotic exploration effects on reinforcement learning in shortcut maze task”, International Journal of Bifurcation and Chaos, vol. 16, no. 10, pp. 3015-3022, 2006.
[27] A.B Potapov, and M.K Ali, “Learning, exploration and chaotic policies”, International Journal of Modern Physics C, vol. 11, no. 07, pp. 1455-1464, 2000.
[28] H. Khodadadi, V. Derhami, “Improving Speed and Efficiency of Dynamic Programming Methods through Chaos”, Journal of AI and Data Mining, vol. 9, no. 4, pp. 487-496, 2021.
[29] B. Zarei, M.R  Meybodi, “Improving learning ability of learning automata using chaos theory”, The Journal of Supercomputing, vol. 77, no. 1, pp. 652-678, 2021.
[30] X. Zhang,  Y. Cao, “A novel chaotic map and an improved chaos-based image encryption scheme”, The Scientific World Journal, Article ID 713541, 2014.
[31] H. Van Hasselt, A. Guez, D. Silver, “Deep reinforcement learning with double q-learning”, In Proceedings of the AAAI conference on artificial intelligence,  (Vol. 30, No. 1), Phoenix, Arizona USA.
[32] T. Schaul, J. Quan,  I. Antonoglou, D.  Silver,   “Prioritized experience replay”, arXiv preprint arXiv:1511.05952, 2015.
[33] Z. Wang,T. Schaul, M. Hessel, H. Hasselt, M .Lanctot, N. Freitas, “Dueling network architectures for deep reinforcement learning”, In International conference on machine learning,  June 2016, PMLR, pp. 1995-2003.
[34] RS. Sutton, AG, Barto,  “Reinforcement learning: An introduction”, 2nd Ed,  The MIT Press, London, 2018.
[35] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba, “Open ai gym”, ArXiv:1606.01540, 2016.