An Actor-Critic Deep Reinforcement Learning Framework for Multi-objective Sequential Decision-making

Document Type : Original Article

Authors

1 Faculty Tarbiat Modares University,, Tehran, Iran

2 Graduate Student at Tarbiat Modares University

Abstract

Sequential decision making describes a situation where the decision maker makes successive observations of a process before a final decision is made. In real-world scenarios, multi-objective sequential decision-making problems have been common and pose multiple challenges for researchers in decision-making. Most studies in this area have traditionally focused on single-objective situations or converted multi-objective problems into single-objective ones by combining objectives into a single goal. In this article, a multi-objective deep reinforcement learning framework called "MACA," based on the actor-critic method is presented, to optimize and balance multiple conflicting objectives in dynamic environments over time. This framework learns different policies for various objectives and eventually converges them to a global optimal policy. This framework, is evaluated in the domain of recommender systems for two conflicting objectives: accuracy (the desirability of recommended items for users) and fairness (the selection of recommended items from all categories); and, compared with other recent multi-objective reinforcement learning methods. Experimental results on the benchmark problem (recommender systems) demonstrate that this framework outperforms previous works in terms of performance (the accuracy was 92.5% with a fairness score of 96.5% on the Kiva dataset, and 93.1% accuracy with a fairness score of 97.6% on the MovieLens dataset), convergence time, and memory consumption. Moreover, the proposed framework is scalable with respect to the number of objectives and enables optimization of the variable number of objectives.

Keywords