An Actor-Critic Deep Reinforcement Learning Framework for Multi-objective Sequential Decision-making

Document Type : Original Article

Authors

1 Electrical and Computer Engineering Department, Tarbiat Modares University, Tehran‎,‎ Iran‎

2 Electrical and Computer Engineering Department, Tarbiat Modares University, Tehran‎,‎ Iran

Abstract

Sequential decision making describes a situation where the decision maker makes successive observations of a process before a final decision is made. In real-world scenarios, multi-objective sequential decision-making problems have been common and pose multiple challenges for researchers in decision-making. Most studies in this area have traditionally focused on single-objective situations or converted multi-objective problems into single-objective ones by combining objectives into a single goal. In this article, a multi-objective deep reinforcement learning framework called "MACA," based on the actor-critic method is presented, to optimize and balance multiple conflicting objectives in dynamic environments over time. This framework learns different policies for various objectives and eventually converges them to a global optimal policy. This framework, is evaluated in the domain of recommender systems for two conflicting objectives: accuracy (the desirability of recommended items for users) and fairness (the selection of recommended items from all categories); and, compared with other recent multi-objective reinforcement learning methods. Experimental results on the benchmark problem (recommender systems) demonstrate that this framework outperforms previous works in terms of performance (the accuracy was 92.5% with a fairness score of 96.5% on the Kiva dataset, and 93.1% accuracy with a fairness score of 97.6% on the MovieLens dataset), convergence time, and memory consumption. Moreover, the proposed framework is scalable with respect to the number of objectives and enables optimization of the variable number of objectives.

Keywords


[1] N. Gunantara, “A review of multi-objective optimization: Methods and its applications”, Cogent Engineering, vol. 5, no. 1, pp. 1–16, 2018.
[2] A. Konak, D.W. Coit, and E.S. Alice, “Multi-objective optimization using genetic algorithms: A tutorial”, Reliability engineering & system safety, pp. 992–1007, 2006.
[3] Q. Q. Wang, Z. D. Li, W. W. Wang, C. L. Zhang, L. Q. Chen, and L. Wan, “Multi-objective optimization design of wheat centralized seed feeding device based on particle swarm optimization (PSO) algorithm”, International Journal of Agricultural and Biological Engineering, vol. 13, no. 6, pp. 76–84, 2020.
[4] B. Razaghi, M. Roayaei, and N. M. Charkari, “On the Group-Fairness-Aware Influence Maximization in Social Networks”, IEEE Transactions on Computational Social Systems, 2022.
[5] M. A. Wiering, M. Withagen, and M. M. Drugan, “Model-based multi-objective reinforcement learning”, IEEE symposium on adaptive dynamic programming and reinforcement learning, pp. 1-6, 2014.
[6] R. Yang, X. Sun, and K. Narasimhan, “A generalized algorithm for multi-objective reinforcement learning and policy adaptation”  Advances in Neural Information Processing Systems, vol. 32, 2019.
[7] J. Xu, Y. Tian, P. Ma, D. Rus, S. Sueda, and W. Matusik, “Prediction-guided multi-objective reinforcement learning for continuous robot control”, 37th International Conference on Machine Learning (ICML),   pp. 10538–10547, 2020.
[8] D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, “A Survey of Multi-Objective Sequential Decision-Making”, Journal of Artificial Intelligence Research, vol. 48, pp. 67–113, 2013.
[9] M. Rodriguez, C. Posse, and E. Zhang, “Multiple objective optimization in recommender systems”, Proceedings of the sixth ACM conference on Recommender systems, pp. 11-18. 2012.
[10] C. Li and K. Czarnecki, “Urban Driving with Multi-Objective Deep Reinforcement Learning”, Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 359-367, 2019.
[11] Z. Miao, J. Yu, J. Ji, and J. Zhou, “Multi-objective region reaching control for a swarm of robots”. Automatica, vol. 103, pp. 81–87, 2019.
[12] J. Jin and X. Ma, “A Multi-Objective Agent-Based Control Approach with Application in Intelligent Traffic Signal System”, IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 10, pp. 3900–3912, 2019.
[13] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, “Playing Atari with Deep Reinforcement Learning”, arXiv preprint arXiv:1312.5602, 2013.
[14] T. T. Nguyen, N. D. Nguyen, P. Vamplew, S. Nahavandi, R. Dazeley, C. P. Lim, “A multi-objective deep reinforcement learning framework”,  Engineering Applications of Artificial Intelligence, vol. 96, p. 103915, 2020.
[15] N. M. Tabatabaei, S.R. Mortezaei, S. Shargh, & B. Khorshid,  "Solving multi-objective optimal power flow using multi-objective electro search algorithm." International Journal on Technical and Physical Problems of Engineering, 9, pp. 1-8, 2017.
[16] N. Tabrizi, E. Babaei, & M. Mehdinejad, M.  “An interactive fuzzy satisfying method based on particle swarm optimization for multi-objective function in reactive power market”, Iranian Journal of Electrical and Electronic Engineering, 12(1), 65-72, 2016.
[17] W. Liu, F. Liu, R. Tang, B. Liao, G. Chen, and P. A. Heng, “Balancing Between Accuracy and Fairness for Interactive Recommendation with Reinforcement Learning” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12084, pp. 155–167, 2020.
[18] M. A. Wiering, E. D. De Jong, “Computing optimal stationary policies for multi-objective Markov decision processes”, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 158–165, 2007.
[19] M. Khamis, W. Gomaa, “Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework”, Engineering Applications of Artificial Intelligence, vol. 29, pp. 134-151, 2014.
[20] J. Lu, H. Liu, Q. Chen, G. Chen, “MoTiAC : Multi-objective Actor-Critics for Real-time Bidding in Display Advertising”, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 20-37, 2022.
[21] M. Reymond, C. Hayes, D. M. Roijers, D. Steckelmacher, A. Nowé, “Actor-Critic Multi-Objective Reinforcement Learning for Non-Linear Utility Functions”, Autonomous Agents and Multi-Agent Systems, vol. 37, no. 2, 2023.
[22] N. D. Nguyen, T. T. Nguyen, P. Vamplew, R. Dazeley, S. Nahavandi, “A Prioritized objective actor-critic method for deep reinforcement learning”, Neural Computing and Applications, vol. 33, no. 16, pp. 10335–10349, 2021.
[23] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I Polosukhin, “Attention is all you need”, Advances in neural information processing systems, vol. 30, 2017.
[24] F. M. Harper, J. A. Konstan, “The MovieLens Datasets: History and Context”, ACM Transactions on Interactive Intelligent Systems, vol.5, 2016.
[25] V. Gajjala, R. Gajjala, A. Birzescu, S. Anarbaeva, “Microfinance in online space: a visual analysis of kiva.org”, Development in Practice, vol. 21, no. 6, pp. 880–893, 2011.
[26] ‎Burke R. ‎Liu W‎, “‎Personalizing fairness-aware re-ranking”, arXiv preprint arXiv:1809.02921‎, 2018.
[27] T. Mahmood, F. Ricci, “Learning and adaptivity in interactive recommender systems”, ACM International Conference Proceeding Series, vol. 258, pp. 75–84, 2007.