Event-Triggered Inverse Reinforcement Learning for Optimal Adaptive Leader-Follower Consensus of Unknown Multi-Agent Systems

Document Type : Original Article

Authors

1 Electrical Engineering Department, Semnan University, Semnan, Iran

2 Electrical Engineering Department, Amirkabir University, Tehran, Iran

Abstract

This paper introduces an event-triggered inverse reinforcement learning (IRL) approach for multi-agent discrete-time graphical games with unknown dynamics. In the IRL problem for these games, the expert and the learner systems are both leader-follower multi-agent systems. The optimal synchronization of the follower agents with the leader is the objective of the expert system. Learner agents intend to imitate the control inputs and states of the expert agents, while the expert value function is unknown to them. For the learner system, an IRL algorithm based on value iteration adaptive dynamic programming is proposed to recreate the unknown value function of the expert and to solve the event-triggered coupled Hamiltonian-Jacobi-Bellman equations without requiring the dynamics of either the expert or learner systems. To implement the proposed algorithm, an actor-critic-state penalty structure is used, and the unknown dynamics of the expert and learner multi-agent systems are approximated by neural network identifiers. Unlike traditional adaptive dynamic programming, where the control policies are periodically updated, in the presented method, the control policies and neural network weights are updated only at the triggered events. Therefore, the computational complexity is reduced. Finally, the efficiency of the proposed technique is demonstrated through simulation results.

Keywords

Main Subjects


[1] K. Deng, Y. Chen, and C. Belta, ‘‘An approximate dynamic programming approach to multiagent persistent monitoring in stochastic environments with temporal logic constraints’’, IEEE Trans. Autom. Control, vol. 62, no. 9, pp. 4549–4563, 2017.
[2] D. Panagou, D. M. Stipanovi¢, and P. G. Voulgaris, ‘‘Distributed coordination control for multi-robot networks using Lyapunov-like barrier functions,’’ IEEE Trans. Autom. Control, vol. 61, no. 3, pp. 617–632, 2015.
[3] J. Long, W. Wang, J. Huang, J. Lu, and K. Liu, “Adaptive leaderless consensus for uncertain high-order nonlinear multiagent systems with event-triggered communication,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 11, pp. 7101–7111, Nov. 2022.
[4] E. Shahamatkhah, M. Tabatabaei, "Event-Follower Tracking Control of Multi-Agent Systems with Multivariable Single Integrator Dynamics", Journal of Electrical Engineering, University of Tabriz, vol. 50, no. 1, pp. 253-267, 2019.
[5] B. Abdolmaleki, A. Seifi, M. M. Arefi, "Event-Excitation Leader-Follower Tracking Control of Multi-Agent Systems with Multivariable Single Integrator Dynamics", Journal of Electrical Engineering, University of Tabriz, Volume 48, Issue 2, Pages 777-784, 2018.
[6] X. Liu, J. Sun, L. Dou, and J. Chen, “Leader-following consensus for discrete-time multi-agent systems with parameter uncertainties based on the event-triggered strategy,” Journal of Systems Science and Complexity, vol. 30, no. 1, pp. 30–45, Feb. 2017.
[7] T. Basar and G.J. Olsder, “Dynamic noncooperative game theory,” Society for Industrial and Applied Mathematics, 1998.
[8] R.S. Sutton, and A.G. Barto, “Reinforcement learning: An introduction,” Robotica, vol. 17, no. 2, pp. 229-235, 1999.
[9] B. Kiumarsi, H. Modares, and F. Lewis, “Reinforcement learning for distributed control and multi-player games”, In Handbook of Reinforcement Learning and Control. Springer, Cham, pp. 7-27, 2021.
[10] K. G. Vamvoudakis, F. L. Lewis, and G. Hudas, “Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality,” Automatica, vol. 48, no. 8, pp. 1598–1611, Aug. 2012.
[11] F. Tatari, M.B. Naghibi-Sistani, and K.G.  Vamvoudakis, “Distributed optimal synchronization control of linear networked systems under unknown dynamics”, American Control Conference(ACC), pp. 668-673, 2017.
[12] M. Abouheaf, F. L. Lewis, K. G. Vamvoudakis, Sofie Haesaert, and R. Babuska, “Multi-agent discrete-time graphical games and reinforcement learning solutions,” Automatica, vol. 50, no. 12, pp. 3038–3053, Dec. 2014.
[13] M. I. Abouheaf, F. L. Lewis, M. S. Mahmoud, and D. G. Mikulski, “Discrete-time dynamic graphical games: model-free reinforcement learning solution,” Control Theory and Technology, vol. 13, no. 1, pp. 55–69, Feb. 2015.
[14] S. Arora, & D. Prashant, “A survey of inverse reinforcement learning: Challenges, methods and progress”, Artificial Intelligence, vol. 297, 2021.
[15] X. Wang and D. Klabjan, “Competitive multi-agent inverse reinforcement learning with sub-optimal demonstrations,” In International Conference on Machine Learning, pp. 5143–5151, 2018.
[16] L. Yu, J. Song, and Stefano Ermon, “Multi-agent adversarial inverse reinforcement learning,” In International conference on machine learning , pp. 7194–7201, May 2019.
[17] C. Mu, K. Wang, Z. Ni, and C. Sun, “Cooperative differential game based optimal control and its application to power systems,” IEEE Trans. Ind. Informat., vol. 16, no. 8, pp. 5169–5179, Aug. 2020.
[18] B. Lian, W. Xue, F. L. Lewis, & T. Chai, “Inverse reinforcement learning for multi-player noncooperative apprentice games”, Automatica, vol. 145, pp. 110524, 2022.
[19] B. Lian, V. S. Donge, F. L. Lewis, T. Chai, and A. Davoudi, “Data-Driven Inverse Reinforcement Learning Control for Linear Multiplayer Games,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–14, 2022.
[20] V. S. Donge, B. Lian, F. L. Lewis, & A. Davoudi, “Multi-agent graphical games with inverse reinforcement learning”, IEEE Transactions on Control of Network Systems, vol. 10, no. 2, pp. 841 – 852, 2022.
[21] X. Li, Y. Tang, & H. R. Karimi, “Consensus of multi-agent systems via fully distributed event-triggered control”, Automatica, vol. 116, pp. 108898, 2020.
[22] X. Li, Z. Sun, Y. Tang, and Hamid Reza Karimi, “Adaptive event-triggered consensus of multiagent systems on directed graphs,” IEEE Transactions on Automatic Control, vol. 66, no. 4, pp. 1670–1685, Apr. 2021.
[23] S. Hu, D. Yue, X. Yin, X. Xie, and Y. Ma, “Adaptive event-triggered control for nonlinear discrete-time systems,” International Journal of Robust and Nonlinear Control, vol. 26, no. 18, pp. 4104–4125, Apr. 2016.
[24] L. Dong, X. Zhong, C. Sun, and H. He, “Adaptive event-triggered control based on heuristic dynamic programming for nonlinear discrete-time systems,” IEEE transactions on neural networks and learning systems , vol. 28, no. 7, pp. 1594–1605, Jul. 2017.
[25] W. Zhao, W. Yu, and H. Zhang, “Event-triggered optimal consensus tracking control for multi-agent systems with unknown internal states and disturbances,” Nonlinear Analysis: Hybrid Systems, vol. 33, pp. 227–248, Aug. 2019.
[26] S. Khoo, L. Xie, and Z. Man, “Robust finite-time consensus tracking
algorithm for multirobot systems,” IEEE Transactions on  Mechatronics, vol. 14, no. 2, pp. 219–228, 2009.
[27] H. Modares, F. L. Lewis and Z. Jiang, “  Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy   Reinforcement Learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 10, pp. 2550-2562, Oct. 2015.
[28] Warren B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality. vol. 703. Hoboken, NJ: John Wiley & Sons, Mar. 2007.