یادگیری تقویتی معکوس مبتنی بر رویداد برای اجماع رهبر-پیرو بهینه تطبیقی سیستم‌های چندعاملی ناشناخته

نوع مقاله : علمی-پژوهشی

نویسندگان

1 دانشجوی دکتری، دانشکده مهندسی برق و کامپیوتر، دانشگاه سمنان، سمنان، ایران

2 دانشیار، دانشکده مهندسی برق و کامپیوتر، دانشگاه سمنان، سمنان، ایران

3 دانشیار، دانشکده مهندسی برق، دانشگاه امیرکبیر، تهران، ایران

چکیده

در این مقاله، یادگیری تقویتی معکوس مبتنی بر رویداد برای بازی‌های گرافی زمان گسسته چند عاملی با دینامیک ناشناخته معرفی می‌شود. در مساله یادگیری تقویتی معکوس برای این بازی‌ها، سیستم خبره و یادگیرنده هر دو یک سیستم چند ‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌عاملی رهبر-پیرو می‌باشند. هدف سیستم خبره هم زمانی بهینه عامل‌های پیرو به عامل رهبر است. عامل‌های یادگیرنده قصد دارند از حالت‌ها و ورودی‌های کنترلی عامل‌های خبره تقلید کنند بطوریکه تابع ارزش خبره برای آن‌ها ناشناخته است. یک الگوریتم یادگیری تقویتی معکوس بر مبنای برنامه‌ریزی پویای تطبیقی برای سیستم یادگیرنده توسعه داده شده است تا تابع عملکرد ناشناخته خبره را بازسازی کند و معادلات همیلتون-ژاکوبی-بلمن مبتنی بر رویداد را بدون نیاز به هیچ دانشی از دینامیک‌های سیستم خبره و یادگیرنده حل کند. برای اجرای الگوریتم ارائه شده، از ساختار شبکه عصبی نقاد-عملگر-پاداش‌حالت استفاده شده ‌است و دینامیک‌های ناشناخته سیستم‌های چندعاملی خبره و یادگیرنده با شبکه‌های عصبی شناساگر تقریب زده می‌شوند. برخلاف برنامه‌ریزی پویای تطبیقی سنتی که قاعده کنترل بصورت دوره‌ای به‌روز می‌شود، در روش ارائه شده قاعده کنترل و وزن‌های شبکه عصبی فقط در لحظات رویداد به‌روز می‌شوند. بنابراین پیچیدگی محاسباتی کاهش می‌یابد. در انتها، نتایج شبیه‌سازی برای توصیف کارایی روش پیشنهادی ارائه شده است.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Event-Triggered Inverse Reinforcement Learning for Optimal Adaptive Leader-Follower Consensus of Unknown Multi-Agent Systems

نویسندگان [English]

  • Zahra Jahan 1
  • Abbas Dideban 2
  • Farzaneh Abdollahi 3
1 Electrical Engineering Department, Semnan University, Semnan, Iran
2 Electrical Engineering Department, Semnan University, Semnan, Iran
3 Electrical Engineering Department, Amirkabir University, Tehran, Iran
چکیده [English]

This paper introduces an event-triggered inverse reinforcement learning (IRL) approach for multi-agent discrete-time graphical games with unknown dynamics. In the IRL problem for these games, the expert and the learner systems are both leader-follower multi-agent systems. The optimal synchronization of the follower agents with the leader is the objective of the expert system. Learner agents intend to imitate the control inputs and states of the expert agents, while the expert value function is unknown to them. For the learner system, an IRL algorithm based on value iteration adaptive dynamic programming is proposed to recreate the unknown value function of the expert and to solve the event-triggered coupled Hamiltonian-Jacobi-Bellman equations without requiring the dynamics of either the expert or learner systems. To implement the proposed algorithm, an actor-critic-state penalty structure is used, and the unknown dynamics of the expert and learner multi-agent systems are approximated by neural network identifiers. Unlike traditional adaptive dynamic programming, where the control policies are periodically updated, in the presented method, the control policies and neural network weights are updated only at the triggered events. Therefore, the computational complexity is reduced. Finally, the efficiency of the proposed technique is demonstrated through simulation results.

کلیدواژه‌ها [English]

  • Inverse reinforcement learning
  • adaptive optimal control
  • event-triggering scheme
  • optimal leader-follower consensus
  • discrete-time graphical games
  • neural networks
[1] K. Deng, Y. Chen, and C. Belta, ‘‘An approximate dynamic programming approach to multiagent persistent monitoring in stochastic environments with temporal logic constraints’’, IEEE Trans. Autom. Control, vol. 62, no. 9, pp. 4549–4563, 2017.
[2] D. Panagou, D. M. Stipanovi¢, and P. G. Voulgaris, ‘‘Distributed coordination control for multi-robot networks using Lyapunov-like barrier functions,’’ IEEE Trans. Autom. Control, vol. 61, no. 3, pp. 617–632, 2015.
[3] J. Long, W. Wang, J. Huang, J. Lu, and K. Liu, “Adaptive leaderless consensus for uncertain high-order nonlinear multiagent systems with event-triggered communication,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 11, pp. 7101–7111, Nov. 2022.
[4] E. Shahamatkhah, M. Tabatabaei, "Event-Follower Tracking Control of Multi-Agent Systems with Multivariable Single Integrator Dynamics", Journal of Electrical Engineering, University of Tabriz, vol. 50, no. 1, pp. 253-267, 2019.
[5] B. Abdolmaleki, A. Seifi, M. M. Arefi, "Event-Excitation Leader-Follower Tracking Control of Multi-Agent Systems with Multivariable Single Integrator Dynamics", Journal of Electrical Engineering, University of Tabriz, Volume 48, Issue 2, Pages 777-784, 2018.
[6] X. Liu, J. Sun, L. Dou, and J. Chen, “Leader-following consensus for discrete-time multi-agent systems with parameter uncertainties based on the event-triggered strategy,” Journal of Systems Science and Complexity, vol. 30, no. 1, pp. 30–45, Feb. 2017.
[7] T. Basar and G.J. Olsder, “Dynamic noncooperative game theory,” Society for Industrial and Applied Mathematics, 1998.
[8] R.S. Sutton, and A.G. Barto, “Reinforcement learning: An introduction,” Robotica, vol. 17, no. 2, pp. 229-235, 1999.
[9] B. Kiumarsi, H. Modares, and F. Lewis, “Reinforcement learning for distributed control and multi-player games”, In Handbook of Reinforcement Learning and Control. Springer, Cham, pp. 7-27, 2021.
[10] K. G. Vamvoudakis, F. L. Lewis, and G. Hudas, “Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality,” Automatica, vol. 48, no. 8, pp. 1598–1611, Aug. 2012.
[11] F. Tatari, M.B. Naghibi-Sistani, and K.G.  Vamvoudakis, “Distributed optimal synchronization control of linear networked systems under unknown dynamics”, American Control Conference(ACC), pp. 668-673, 2017.
[12] M. Abouheaf, F. L. Lewis, K. G. Vamvoudakis, Sofie Haesaert, and R. Babuska, “Multi-agent discrete-time graphical games and reinforcement learning solutions,” Automatica, vol. 50, no. 12, pp. 3038–3053, Dec. 2014.
[13] M. I. Abouheaf, F. L. Lewis, M. S. Mahmoud, and D. G. Mikulski, “Discrete-time dynamic graphical games: model-free reinforcement learning solution,” Control Theory and Technology, vol. 13, no. 1, pp. 55–69, Feb. 2015.
[14] S. Arora, & D. Prashant, “A survey of inverse reinforcement learning: Challenges, methods and progress”, Artificial Intelligence, vol. 297, 2021.
[15] X. Wang and D. Klabjan, “Competitive multi-agent inverse reinforcement learning with sub-optimal demonstrations,” In International Conference on Machine Learning, pp. 5143–5151, 2018.
[16] L. Yu, J. Song, and Stefano Ermon, “Multi-agent adversarial inverse reinforcement learning,” In International conference on machine learning , pp. 7194–7201, May 2019.
[17] C. Mu, K. Wang, Z. Ni, and C. Sun, “Cooperative differential game based optimal control and its application to power systems,” IEEE Trans. Ind. Informat., vol. 16, no. 8, pp. 5169–5179, Aug. 2020.
[18] B. Lian, W. Xue, F. L. Lewis, & T. Chai, “Inverse reinforcement learning for multi-player noncooperative apprentice games”, Automatica, vol. 145, pp. 110524, 2022.
[19] B. Lian, V. S. Donge, F. L. Lewis, T. Chai, and A. Davoudi, “Data-Driven Inverse Reinforcement Learning Control for Linear Multiplayer Games,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–14, 2022.
[20] V. S. Donge, B. Lian, F. L. Lewis, & A. Davoudi, “Multi-agent graphical games with inverse reinforcement learning”, IEEE Transactions on Control of Network Systems, vol. 10, no. 2, pp. 841 – 852, 2022.
[21] X. Li, Y. Tang, & H. R. Karimi, “Consensus of multi-agent systems via fully distributed event-triggered control”, Automatica, vol. 116, pp. 108898, 2020.
[22] X. Li, Z. Sun, Y. Tang, and Hamid Reza Karimi, “Adaptive event-triggered consensus of multiagent systems on directed graphs,” IEEE Transactions on Automatic Control, vol. 66, no. 4, pp. 1670–1685, Apr. 2021.
[23] S. Hu, D. Yue, X. Yin, X. Xie, and Y. Ma, “Adaptive event-triggered control for nonlinear discrete-time systems,” International Journal of Robust and Nonlinear Control, vol. 26, no. 18, pp. 4104–4125, Apr. 2016.
[24] L. Dong, X. Zhong, C. Sun, and H. He, “Adaptive event-triggered control based on heuristic dynamic programming for nonlinear discrete-time systems,” IEEE transactions on neural networks and learning systems , vol. 28, no. 7, pp. 1594–1605, Jul. 2017.
[25] W. Zhao, W. Yu, and H. Zhang, “Event-triggered optimal consensus tracking control for multi-agent systems with unknown internal states and disturbances,” Nonlinear Analysis: Hybrid Systems, vol. 33, pp. 227–248, Aug. 2019.
[26] S. Khoo, L. Xie, and Z. Man, “Robust finite-time consensus tracking
algorithm for multirobot systems,” IEEE Transactions on  Mechatronics, vol. 14, no. 2, pp. 219–228, 2009.
[27] H. Modares, F. L. Lewis and Z. Jiang, “  Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy   Reinforcement Learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 10, pp. 2550-2562, Oct. 2015.
[28] Warren B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality. vol. 703. Hoboken, NJ: John Wiley & Sons, Mar. 2007.