Employing Chaos Theory for Exploration–Exploitation Balance in Deep Reinforcement Learning

Document Type : Original Article

Authors

1 Department of Computer Engineering, Yazd University, Yazd, Iran

2 Department of Computer Engineering, Yazd University, Yazd, Iran.

Abstract

Deep reinforcement learning is widely used in machine learning problems and the use of methods to improve its performance is important. Balance between exploration and exploitation is one of the important issues in reinforcement learning and for this purpose, action selection methods that involve exploration such as ɛ-greedy and Soft-max are used. In these methods, by generating random numbers and evaluating the action-value, an action is selected that can maintain this balance. Over time, with appropriate exploration, it can be expected that the environment becomes better understood and more valuable actions are identified. Chaos, with features such as high sensitivity to initial conditions, non-periodicity, unpredictability, exploration of all possible search space states, and pseudo-random behavior, has many applications. In this article, numbers generated by chaotic systems are used for the ɛ-greedy action selection method in deep reinforcement learning to improve the balance between exploration and exploitation; in addition, the impact of using chaos in replay buffer will also be investigated. Experiments conducted in the Lunar Lander environment demonstrate a significant increase in learning speed and higher rewards in this environment

Keywords

Main Subjects