中文版 | English
Title

Multi-agent Coordination Algorithms for Pursuit-Evasion

Author
Name pinyin
SUN Lijun
School number
11860004
Degree
博士
Discipline
计算机
Supervisor
史玉回
Mentor unit
计算机科学与工程系
Publication Years
2023-07-28
Submission date
2023-11-10
University
悉尼科技大学
Place of Publication
悉尼
Abstract

The multi-agent coordination or swarm intelligence is a paramount concern in multi-agent systems (MAS) that determines the exclusive advantage over single-agent systems. Although diverse swarm robots tasks are achieved and complex multi-agent strategies emerge, the real-world application of MAS is still challenging and limited, such as in the large-scale warehouse robots, autonomy traffic, and swarm  drones. Among diverse MAS benchmarks, the pursuit-evasion game is a popular, general, and representative one that models practical coordination demands and has attracted sustained research efforts. Therefore, based on the pursuit-evasion variants, this research investigates the following five coordination aspects and proposes corresponding solutions.

First, the safe multi-agent coordination problem is investigated. Popular multi-agent benchmarks provide limited safety support for the safe multi-agent reinforcement learning (MARL) research, where negative reward for collisions cannot guarantee the safety. Therefore, this research proposes a new safety-constrained multi-agent environment: MatrixWorld, based on the general pursuit-evasion game. In particular, the multi-agent safety constraints are implemented by  three classification ways of pursuit-evasion games: the multi-agent-environment interaction model, the collision resolution mechanism in multi-agent action execution model, and the game termination condition. Besides, MatrixWorld is a lightweight co-evolution framework for the learning of pursuit tasks, evasion tasks, or both, where more pursuit-evasion variants can be designed based on different practical meanings of  safety.

Second, the NP-hard distributed coordination problem is investigated throughout our research. For example, in the fully observable pursuit of a single evader, this research proposes the cooperative co-evolutionary particle swarm optimization algorithm for robots (CCPSO-R). It introduces the concept of virtual agents and utilizes the cooperative co-evolutionary evaluation mechanism for the decentralized cooperation of on-line planning pursuers. Experiments are conducted on a scalable swarm of pursuers with 4 types of evaders, the results of which show the reliability, generality, and scalability of the proposed CCPSO-R. Comparison with a representative dynamic path planning based algorithm Multi-Agent Real-Time Pursuit (MAPS) further shows the effectiveness of CCPSO-R.

Third, the NP-complete multi-agent task allocation problem is investigated in the pursuit-evasion variants with more than one evaders. For example, in the fully observable pursuit of multiple evaders, this research proposes the two-stage approach: BiPCCR, which solves in a dynamic optimization way. In particular, a multi-evader pursuit (MEP) fitness function is proposed for the involved bi-quadratic assignment problem (BiQAP), which significantly reduces the search cost. Besides, based on the domain knowledge, one BiQAP solver is improved to work better statistically. In this work, the safety of CCPSO-R algorithm is enhanced in the proposed PCCPSO-R algorithm for the simultaneous multi-agent decision-making and action execution.

Fourth, the multi-agent observation uncertainty and interaction uncertainty are investigated in the partial observable pursuit-evasion variants. Further, to avoid the coordination performance degradation due to communication failures and be immune from the communication cost, a more restricted self-organizing setup with only implicit coordination is considered. To address the above challenges, this research proposes a distributed hierarchical framework called the fuzzy self-organizing cooperative co-evolution (FSC2) algorithm. The experimental results demonstrate that by decomposing the task by FSC2,  superior performance are achieved compared with other implicit coordination policies fully trained by general MARL algorithms. The scalability of FSC2 is proved that up to 2048 FSC2 agents perform efficiently with almost 100% capture rates. Empirical analyses and ablation studies verify the interpretability, rationality, and effectiveness of component algorithms in FSC2.

Last, open problems and magics in the autocurriculum learning are explored in the co-evolutionary pursuit-evasion variants. To better understand related research works and more accurately use similar terminologies, this research reviews and analyzes the co-evolution mechanism in the multi-agent setting, which clearly reveals its relationships with autocurricula, self-play, arms races, and adversarial learning. Then, through adversarial learning, this research achieves various arms race outcomes of different co-evolution mechanisms. Based on experiments, arms races with steady and converging improvement are more practical for increasingly complex behaviors, while policy cycles between two rival sides are useful for producing diverse policies. In particular, this research finds that the passive (evasive) policy learning benefits more from co-evolution than active (pursuing) policy learning in an asymmetric adversarial game.

Keywords
Language
English
Training classes
联合培养
Enrollment Year
2019
Year of Degree Awarded
2023-08
References List

[1] J. L. BA, J. R. KIROS, AND G. E. HINTON, Layer normalization, arXiv preprint arXiv:1607.06450, (2016).
[2] B. BAKER, I. KANITSCHEIDER, T. MARKOV, Y. WU, G. POWELL, B. MCGREW, AND I. MORDATCH, Emergent tool use from multi-agent autocurricula, in International Conference on Learning Representations, 2019.
[3] N. BALAJI, S. KIEFER, P. NOVOTNY` , G. A. PÉREZ, AND M. SHIRMOHAMMADI, On the complexity of value iteration, arXiv preprint arXiv:1807.04920, (2018).
[4] S. BARRETT, A. ROSENFELD, S. KRAUS, AND P. STONE, Making friends on the fly: Cooperating with new teammates, Artificial Intelligence, 242 (2017), pp. 132 – 171.
[5] S. BARRETT AND P. STONE, An analysis framework for ad hoc teamwork tasks, in Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, AAMAS 12, Richland, SC, 2012, International Foundation for Autonomous Agents and Multiagent Systems, pp. 357–364.
[6] , Cooperating with unknown teammates in complex domains: A robot soccer case study of ad hoc teamwork, in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI 15, AAAI Press, 2015, pp. 2010– 2016.
[7] S. BARRETT, P. STONE, AND S. KRAUS, Empirical evaluation of ad hoc teamwork in the pursuit domain, in The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, AAMAS ’11, Richland, SC, 2011, International Foundation for Autonomous Agents and Multiagent Systems, pp. 567–574.
[8] S. BARRETT, P. STONE, S. KRAUS, AND A. ROSENFELD, Teamwork with lim- ited knowledge of teammates, in Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, AAAI 13, AAAI Press, 2013, pp. 102–108.
[9] M. BENDA, V. JAGANNATHAN, AND R. DODHIAWALA, On optimal cooperation of knowledge sources-an empirical investigation, tech. rep., BCS-G2010-28, Boeing Advanced Technology Center, Boeing Computing Services, Seattle, Washington, 1986.
[10] Y. BENGIO, J. LOURADOUR, R. COLLOBERT, AND J. WESTON, Curriculum learn- ing, in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 41–48.
[11] D. S. BERNSTEIN, R. GIVAN, N. IMMERMAN, AND S. ZILBERSTEIN, The complexity of decentralized control of markov decision processes, Mathematics of operations research, 27 (2002), pp. 819–840.
[12] J. C. BEZDEK, Pattern recognition with fuzzy objective function algorithms, Springer, Boston, MA, 2013.
[13] A. BONATO, The game of cops and robbers on graphs, American Mathematical Soc., 2011.
[14] C. BOUTILIER, Planning, learning and coordination in multiagent decision pro- cesses, in TARK, vol. 96, Citeseer, 1996, pp. 195–210.
[15] , Sequential optimality and coordination in multiagent systems, in IJCAI, vol. 99, 1999, pp. 478–485.
[16] R. BURKARD, M. DELL’AMICO, AND S. MARTELLO, Assignment Problems, Society for Industrial and Applied Mathematics, 2012.
[17] R. E. BURKARD AND E. ÇELA, Heuristics for biquadratic assignment problems and their computational comparison, European Journal of Operational Research, 83 (1995), pp. 283–300.
[18] R. E. BURKARD, E. CELA, AND B. KLINZ, On the biquadratic assignment problem, in Quadratic Assignment and Related Problems: DIMACS Workshop, May 20-21, 1993, vol. 16, American Mathematical Soc., 1994, pp. 117–146.
[19] S. CAMAZINE, J.-L. DENEUBOURG, N. R. FRANKS, J. SNEYD, G. THERAULA, AND E. BONABEAU, Self-organization in biological systems, Princeton university press, 2001.
[20] E. CELA, The quadratic assignment problem: theory and algorithms, vol. 1, Springer Science & Business Media, 2013.
[21] T. H. CHUNG, G. A. HOLLINGER, AND V. ISLER, Search and pursuit-evasion in mobile robotics, Autonomous robots, 31 (2011), p. 299.
[22] M. S. COUCEIRO, R. P. ROCHA, AND N. M. F. FERREIRA, A novel multi-robot exploration approach based on particle swarm optimization algorithms, in 2011 IEEE International Symposium on Safety, Security, and Rescue Robotics, Nov 2011, pp. 327–332.
[23] R. DAWKINS AND J. R. KREBS, Arms races between and within species, Proceedings of the Royal Society of London. Series B. Biological Sciences, 205 (1979), pp. 489– 511.
[24] C. DE SOUZA, R. NEWBURY, A. COSGUN, P. CASTILLO, B. VIDOLOV, AND D. KULIƒÁ, Decentralized multi-agent pursuit using deep reinforcement learn- ing, IEEE Robotics and Automation Letters, 6 (2021), pp. 4552–4559.
[25] S. L. DEVADOSS AND J. O’ROURKE, Discrete and computational geometry, Prince- ton University Press, 2011.
[26] M. EGOROV, Multi-agent deep reinforcement learning, CS231n: Convolutional Neural Networks for Visual Recognition, (2016).
[27] A. EIBEN AND J. SMITH, Introduction to evolutionary computing, Springer, 2015.
[28] J. FOERSTER, G. FARQUHAR, T. AFOURAS, N. NARDELLI, AND S. WHITESON, Counterfactual multi-agent policy gradients, in Proceedings of the AAAI confer- ence on artificial intelligence, vol. 32, 2018.
[29] F. V. FOMIN, P. A. GOLOVACH, AND J. KRATOCHVÍL, On tractability of cops and robbers game, in Fifth Ifip International Conference On Theoretical Computer Science – Tcs 2008, G. Ausiello, J. Karhumäki, G. Mauri, and L. Ong, eds., Boston, MA, 2008, Springer US, pp. 171–185.
[30] M. R. GAREY AND D. S. JOHNSON, Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman, USA, 1979.
[31] A. GLEAVE, M. DENNIS, C. WILD, N. KANT, S. LEVINE, AND S. RUSSELL, Adversarial policies: Attacking deep reinforcement learning, arXiv preprint arXiv:1905.10615, (2019).
[32] I. GOODFELLOW, J. POUGET-ABADIE, M. MIRZA, B. XU, D. WARDE-FARLEY, S. OZAIR, A. COURVILLE, AND Y. BENGIO, Generative adversarial nets, in Ad- vances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, eds., vol. 27, Curran Associates, Inc., 2014.
[33] Y. GUAN, D. MAITY, C. M. KRONINGER, AND P. TSIOTRAS, Bounded-rational pursuit-evasion games, in 2021 American Control Conference (ACC), 2021, pp. 3216–3221.
[34] J. K. GUPTA, M. EGOROV, AND M. KOCHENDERFER, Cooperative multi-agent control using deep reinforcement learning, in Autonomous Agents and Multia- gent Systems, G. Sukthankar and J. A. Rodriguez-Aguilar, eds., Cham, 2017, Springer International Publishing, pp. 66–83.
[35] T. HAYNES AND S. SEN, Evolving behavioral strategies in predators and prey, in Adaption and Learning in Multi-Agent Systems, G. Weiß and S. Sen, eds., Berlin, Heidelberg, 1996, Springer Berlin Heidelberg, pp. 113–126.
[36] , Learning cases to resolve conflicts and improve group behavior, International Journal of Human-Computer Studies, 48 (1998), pp. 31–49.
[37] T. HAYNES, R. L. WAINWRIGHT, AND S. SEN, Evolving cooperation strategies., in ICMAS, 1995, p. 450.
[38] T. HAYNES, R. L. WAINWRIGHT, S. SEN, AND D. A. SCHOENEFELD, Strongly typed genetic programming in evolving cooperation strategies., in ICGA, vol. 95, 1995, pp. 271–278.
[39] R. ISAACS, Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization, New York: John Wiley and Sons, 1965.
[40] Y. ISHIWAKA, T. SATO, AND Y. KAKAZU, An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning, Robotics and Autonomous Systems, 43 (2003), pp. 245 – 256.
[41] M. JADERBERG, W. M. CZARNECKI, I. DUNNING, L. MARRIS, G. LEVER, A. G. CASTANEDA, C. BEATTIE, N. C. RABINOWITZ, A. S. MORCOS, A. RUDERMAN, ET AL., Human-level performance in 3d multiplayer games with population- based reinforcement learning, Science, 364 (2019), pp. 859–865.
[42] J. JIANG, C. DUN, T. HUANG, AND Z. LU, Graph convolutional reinforcement learning, in International Conference on Learning Representations, 2019.
[43] V. KONDA AND J. TSITSIKLIS, Actor-critic algorithms, Advances in neural infor- mation processing systems, 12 (1999).
[44] R. E. KORF, A simple solution to pursuit games, in Working Papers of The 11th International Workshop on Distributed Artificial Intelligence, 1992, pp. 183– 194.
[45] J. R. KOZA, Genetic programming, MIT Press, Cambridge, MA, 1992.
[46] J. Z. LEIBO, E. HUGHES, M. LANCTOT, AND T. GRAEPEL, Autocurricula and the emergence of innovation from social interaction: A manifesto for multi-agent intelligence research, arXiv preprint arXiv:1903.00742, (2019).
[47] R. LEVY AND J. S. ROSENSCHEIN, A game theoretic approach to distributed artificial intelligence and the pursuit problem, ACM SIGOIS Bulletin, 13 (1992), p. 11.
[48] J. E. LITTLEWOOD, A mathematician’s miscellany, Methuen & Co. Ltd., London, 1953.
[49] R. LOWE, Y. WU, A. TAMAR, J. HARB, P. ABBEEL, AND I. MORDATCH, Multi-agent actor-critic for mixed cooperative-competitive environments, Neural Information Processing Systems (NIPS), (2017).
[50] X. MA, K. DRIGGS-CAMPBELL, AND M. J. KOCHENDERFER, Improved robust- ness and safety for autonomous vehicle control with adversarial reinforcement learning, in 2018 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2018, pp. 1665–1671.
[51] P. MAES, M. J. MATARIC, J.-A. MEYER, J. POLLACK, AND S. W. WILSON, Co- evolution of pursuit and evasion ii: Simulation methods and results, (1996).
[52] T. MAVRIDOU, P. PARDALOS, L. PITSOULIS, AND M. G. RESENDE, A grasp for the biquadratic assignment problem, European Journal of Operational Research, 105 (1998), pp. 613 – 621.
[53] G. F. MILLER AND D. CLIFF, Co-evolution of pursuit and evasion I: Biological and game-theoretic foundations, School of Cognitive and Computing Sciences, University of Sussex Brighton, 1994.
[54] , Protean behavior in dynamic games: Arguments for the co-evolution of pursuit- evasion tactics, From animals to animats, 3 (1994), pp. 411–420.
[55] D. J. MONTANA, Strongly typed genetic programming, Evolutionary computation, 3 (1995), pp. 199–230.
[56] K. H. W. MYKEL J. KOCHENDERFER, TIM A. WHEELER, Algorithms for Decision Making, MIT Press, 2022.
[57] G. NITSCHKE, Co-evolution of cooperation in a pursuit evasion game, in Pro- ceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), vol. 2, IEEE, 2003, pp. 2037–2042.
[58] S. NOLFI AND D. FLOREANO, How co-evolution can enhance the adaptive power of artificial evolution: Implications for evolutionary robotics, in Evolutionary Robotics: First European Workshop, EvoRobot98 Paris, France, April 16–17, 1998 Proceedings 1, Springer, 1998, pp. 22–38.
[59] R. NOWAKOWSKI AND P. WINKLER, Vertex-to-vertex pursuit in a graph, Discrete Mathematics, 43 (1983), pp. 235 – 239.
[60] A. OROOJLOOY AND D. HAJINEZHAD, A review of cooperative multi-agent deep reinforcement learning, Applied Intelligence, (2022), pp. 1–46.
[61] E. OSAWA, A metalevel coordination strategy for reactive cooperative planning., in ICMAS, vol. 95, 1995, pp. 297–303.
[62] X. PAN, D. SEITA, Y. GAO, AND J. CANNY, Risk averse robust adversarial reinforce- ment learning, in 2019 International Conference on Robotics and Automation (ICRA), IEEE, 2019, pp. 8522–8528.
[63] C. H. PAPADIMITRIOU AND J. N. TSITSIKLIS, The complexity of markov decision processes, Mathematics of operations research, 12 (1987), pp. 441–450.
[64] P. M. PARDALOS AND L. S. PITSOULIS, Nonlinear assignment problems: algo- rithms and applications, vol. 7, Springer Science & Business Media, 2013.
[65] T. D. PARSONS, Pursuit-evasion in a graph, in Theory and Applications of Graphs, Berlin, Heidelberg, 1978, Springer Berlin Heidelberg, pp. 426–441.
[66] L. PINTO, J. DAVIDSON, R. SUKTHANKAR, AND A. GUPTA, Robust adversarial reinforcement learning, in International Conference on Machine Learning, PMLR, 2017, pp. 2817–2826.
[67] T. PITCHER, A. MAGURRAN, AND I. WINFIELD, Fish in larger shoals find food faster, Behavioral Ecology and Sociobiology, 10 (1982), pp. 149–151.
[68] A. QUILLIOT, Jeux et pointes fixes sur les graphes, PhD thesis, Université de Paris VI, 1978.
[69] T. RASHID, M. SAMVELYAN, C. SCHROEDER, G. FARQUHAR, J. FOERSTER, AND S. WHITESON, Qmix: Monotonic value function factorisation for deep multi- agent reinforcement learning, in International Conference on Machine Learning, PMLR, 2018, pp. 4295–4304.
[70] C. W. REYNOLDS, Flocks, herds and schools: A distributed behavioral model, in Pro- ceedings of the 14th annual conference on Computer graphics and interactive techniques, 1987, pp. 25–34.
[71] , Competition, coevolution and the game of tag, in Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, 1994, pp. 59–69.
[72] S. RUSSELL AND P. NORVIG, Artificial Intelligence: A Modern Approach, 4th Edition, Pearson Education, 2021.
[73] M. SAMVELYAN, T. RASHID, C. SCHROEDER DE WITT, G. FARQUHAR, N. NARDELLI, T. G. RUDNER, C.-M. HUNG, P. H. TORR, J. FOERSTER, AND S. WHITESON, The starcraft multi-agent challenge, in Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019, pp. 2186–2188.
[74] J. SCHULMAN, P. MORITZ, S. LEVINE, M. JORDAN, AND P. ABBEEL, High- dimensional continuous control using generalized advantage estimation, arXiv preprint arXiv:1506.02438, (2015).
[75] J. SCHULMAN, F. WOLSKI, P. DHARIWAL, A. RADFORD, AND O. KLIMOV, Proxi- mal policy optimization algorithms, arXiv preprint arXiv:1707.06347, (2017).
[76] S. SEUKEN AND S. ZILBERSTEIN, Formal models and algorithms for decentralized control of multiple agents, tech. rep., Technical Report 05-68, Department of Computer Science, University of Massachusetts Amherst, 2005.
[77] Y. SHI AND R. EBERHART, A modified particle swarm optimizer, in 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360), May 1998, pp. 69–73.
[78] L. STEPHENS, Agent organization as an effector of dai system performance, in Proceedings of the 9th Workshop on Distributed Artificial Intelligence, 1989, 1989.
[79] L. M. STEPHENS AND M. B. MERX, The effect of agent control strategy on the performance of a dai pursuit problem, in Proceedings of the 10th International Workshop on Distributed Artificial Intelligence, 1990.
[80] P. STONE AND M. VELOSO, Multiagent systems: A survey from a machine learning perspective, Autonomous Robots, 8 (2000), pp. 345–383.
[81] S. SUKHBAATAR, Z. LIN, I. KOSTRIKOV, G. SYNNAEVE, A. SZLAM, AND R. FER- GUS, Intrinsic motivation and automatic curricula via asymmetric self-play, arXiv preprint arXiv:1703.05407, (2017).
[82] L. SUN, Y.-C. CHANG, C. LYU, Y. SHI, Y. SHI, AND C.-T. LIN, Toward multi-target self-organizing pursuit in a partially observable markov game, arXiv preprint arXiv:2206.12330, (2022).
[83] L. SUN, C. LYU, AND Y. SHI, Cooperative coevolution of real predator robots and virtual robots in the pursuit domain, Applied Soft Computing, 89 (2020), p. 106098.
[84] L. SUN, C. LYU, Y. SHI, AND C.-T. LIN, Multiple-preys pursuit based on bi- quadratic assignment problem, in 2021 IEEE Congress on Evolutionary Com- putation (CEC), 2021, pp. 1585–1592.
[85] R. S. SUTTON AND A. G. BARTO, Reinforcement learning: An introduction, MIT press, 2018.
[86] M. TAN, Multi-agent reinforcement learning: Independent vs. cooperative agents, in Proceedings of the tenth international conference on machine learning, 1993, pp. 330–337.
[87] X. TANG, D. YE, L. HUANG, Z. SUN, AND J. SUN, Pursuit-evasion game switching strategies for spacecraft with incomplete-information, Aerospace Science and Technology, 119 (2021), p. 107112.
[88] J. TERRY, B. BLACK, N. GRAMMEL, M. JAYAKUMAR, A. HARI, R. SULLIVAN, L. S. SANTOS, C. DIEFFENDAHL, C. HORSCH, R. PEREZ-VICENTE, ET AL., Pettingzoo: Gym for multi-agent reinforcement learning, Advances in Neural Information Processing Systems, 34 (2021), pp. 15032–15043.
[89] J. K. TERRY, B. BLACK, M. JAYAKUMAR, A. HARI, R. SULLIVAN, L. SAN- TOS, C. DIEFFENDAHL, N. L. WILLIAMS, Y. LOKESH, C. HORSCH, ET AL., Pettingzoo: Gym for multi-agent reinforcement learning, arXiv preprint arXiv:2009.14471, (2020).
[90] J. K. TERRY, N. GRAMMEL, A. HARI, L. SANTOS, AND B. BLACK, Revisiting parameter sharing in multi-agent deep reinforcement learning, arXiv preprint arXiv:2005.13625, (2020).
[91] C. UNDEGER AND F. POLAT, Multi-agent real-time pursuit, Autonomous Agents and Multi-Agent Systems, 21 (2010), pp. 69–107.
[92] O. VINYALS, I. BABUSCHKIN, W. M. CZARNECKI, M. MATHIEU, A. DUDZIK, J. CHUNG, D. H. CHOI, R. POWELL, T. EWALDS, P. GEORGIEV, ET AL., Grand- master level in starcraft ii using multi-agent reinforcement learning, Nature, 575 (2019), pp. 350–354.
[93] X. WANG, Y. CHEN, AND W. ZHU, A survey on curriculum learning, IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 44 (2021), pp. 4555–4576.
[94] Z. WANG, B. GONG, Y. YUAN, AND X. DING, Incomplete information pursuit- evasion game control for a space non-cooperative target, Aerospace, 8 (2021).
[95] C. WU, A. R. KREIDIEH, K. PARVATE, E. VINITSKY, AND A. M. BAYEN, Flow: A modular learning framework for mixed autonomy traffic, IEEE Transactions on Robotics, 38 (2021), pp. 1270–1286.
[96] Y. YANG, R. LUO, M. LI, M. ZHOU, W. ZHANG, AND J. WANG, Mean field multi- agent reinforcement learning, in International conference on machine learning, PMLR, 2018, pp. 5571–5580.
[97] D. YE, M. SHI, AND Z. SUN, Satellite proximate pursuit-evasion game with dif- ferent thrust configurations, Aerospace Science and Technology, 99 (2020), p. 105715.
[98] C. YU, A. VELU, E. VINITSKY, J. GAO, Y. WANG, A. BAYEN, AND Y. WU, The surprising effectiveness of ppo in cooperative multi-agent games, Advances in Neural Information Processing Systems, 35 (2022), pp. 24611–24624.
[99] K. ZHANG, Z. YANG, AND T. BAS ̧ AR, Multi-agent reinforcement learning: A selective overview of theories and algorithms, Handbook of reinforcement learning and control, (2021), pp. 321–384.
[100] L. ZHENG, J. YANG, H. CAI, M. ZHOU, W. ZHANG, J. WANG, AND Y. YU, Magent: A many-agent reinforcement learning platform for artificial collective intelligence, Proceedings of the AAAI Conference on Artificial Intelligence, 32 (2018).
[101] Z. ZHOU AND H. XU, Decentralized optimal large scale multi-player pursuit- evasion strategies: A mean field game approach with reinforcement learning, Neurocomputing, (2021).

Data Source
人工提交
Document TypeThesis
Identifierhttp://kc.sustech.edu.cn/handle/2SGJ60CL/575894
DepartmentDepartment of Computer Science and Engineering
Recommended Citation
GB/T 7714
Sun LJ. Multi-agent Coordination Algorithms for Pursuit-Evasion[D]. 悉尼. 悉尼科技大学,2023.
Files in This Item:
File Name/Size DocType Version Access License
11860004-孙立君-计算机科学与工(10576KB) Restricted Access--Fulltext Requests
Related Services
Fulltext link
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Export to Excel
Export to Csv
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[孙立君]'s Articles
Baidu Scholar
Similar articles in Baidu Scholar
[孙立君]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[孙立君]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.