J. L. BA, J. R. KIROS, AND G. E. HINTON, Layer normalization, arXiv preprint arXiv:1607.06450, (2016).
 B. BAKER, I. KANITSCHEIDER, T. MARKOV, Y. WU, G. POWELL, B. MCGREW, AND I. MORDATCH, Emergent tool use from multi-agent autocurricula, in International Conference on Learning Representations, 2019.
 N. BALAJI, S. KIEFER, P. NOVOTNY` , G. A. PÉREZ, AND M. SHIRMOHAMMADI, On the complexity of value iteration, arXiv preprint arXiv:1807.04920, (2018).
 S. BARRETT, A. ROSENFELD, S. KRAUS, AND P. STONE, Making friends on the fly: Cooperating with new teammates, Artificial Intelligence, 242 (2017), pp. 132 – 171.
 S. BARRETT AND P. STONE, An analysis framework for ad hoc teamwork tasks, in Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, AAMAS 12, Richland, SC, 2012, International Foundation for Autonomous Agents and Multiagent Systems, pp. 357–364.
 , Cooperating with unknown teammates in complex domains: A robot soccer case study of ad hoc teamwork, in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI 15, AAAI Press, 2015, pp. 2010– 2016.
 S. BARRETT, P. STONE, AND S. KRAUS, Empirical evaluation of ad hoc teamwork in the pursuit domain, in The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, AAMAS ’11, Richland, SC, 2011, International Foundation for Autonomous Agents and Multiagent Systems, pp. 567–574.
 S. BARRETT, P. STONE, S. KRAUS, AND A. ROSENFELD, Teamwork with lim- ited knowledge of teammates, in Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, AAAI 13, AAAI Press, 2013, pp. 102–108.
 M. BENDA, V. JAGANNATHAN, AND R. DODHIAWALA, On optimal cooperation of knowledge sources-an empirical investigation, tech. rep., BCS-G2010-28, Boeing Advanced Technology Center, Boeing Computing Services, Seattle, Washington, 1986.
 Y. BENGIO, J. LOURADOUR, R. COLLOBERT, AND J. WESTON, Curriculum learn- ing, in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 41–48.
 D. S. BERNSTEIN, R. GIVAN, N. IMMERMAN, AND S. ZILBERSTEIN, The complexity of decentralized control of markov decision processes, Mathematics of operations research, 27 (2002), pp. 819–840.
 J. C. BEZDEK, Pattern recognition with fuzzy objective function algorithms, Springer, Boston, MA, 2013.
 A. BONATO, The game of cops and robbers on graphs, American Mathematical Soc., 2011.
 C. BOUTILIER, Planning, learning and coordination in multiagent decision pro- cesses, in TARK, vol. 96, Citeseer, 1996, pp. 195–210.
 , Sequential optimality and coordination in multiagent systems, in IJCAI, vol. 99, 1999, pp. 478–485.
 R. BURKARD, M. DELL’AMICO, AND S. MARTELLO, Assignment Problems, Society for Industrial and Applied Mathematics, 2012.
 R. E. BURKARD AND E. ÇELA, Heuristics for biquadratic assignment problems and their computational comparison, European Journal of Operational Research, 83 (1995), pp. 283–300.
 R. E. BURKARD, E. CELA, AND B. KLINZ, On the biquadratic assignment problem, in Quadratic Assignment and Related Problems: DIMACS Workshop, May 20-21, 1993, vol. 16, American Mathematical Soc., 1994, pp. 117–146.
 S. CAMAZINE, J.-L. DENEUBOURG, N. R. FRANKS, J. SNEYD, G. THERAULA, AND E. BONABEAU, Self-organization in biological systems, Princeton university press, 2001.
 E. CELA, The quadratic assignment problem: theory and algorithms, vol. 1, Springer Science & Business Media, 2013.
 T. H. CHUNG, G. A. HOLLINGER, AND V. ISLER, Search and pursuit-evasion in mobile robotics, Autonomous robots, 31 (2011), p. 299.
 M. S. COUCEIRO, R. P. ROCHA, AND N. M. F. FERREIRA, A novel multi-robot exploration approach based on particle swarm optimization algorithms, in 2011 IEEE International Symposium on Safety, Security, and Rescue Robotics, Nov 2011, pp. 327–332.
 R. DAWKINS AND J. R. KREBS, Arms races between and within species, Proceedings of the Royal Society of London. Series B. Biological Sciences, 205 (1979), pp. 489– 511.
 C. DE SOUZA, R. NEWBURY, A. COSGUN, P. CASTILLO, B. VIDOLOV, AND D. KULIƒÁ, Decentralized multi-agent pursuit using deep reinforcement learn- ing, IEEE Robotics and Automation Letters, 6 (2021), pp. 4552–4559.
 S. L. DEVADOSS AND J. O’ROURKE, Discrete and computational geometry, Prince- ton University Press, 2011.
 M. EGOROV, Multi-agent deep reinforcement learning, CS231n: Convolutional Neural Networks for Visual Recognition, (2016).
 A. EIBEN AND J. SMITH, Introduction to evolutionary computing, Springer, 2015.
 J. FOERSTER, G. FARQUHAR, T. AFOURAS, N. NARDELLI, AND S. WHITESON, Counterfactual multi-agent policy gradients, in Proceedings of the AAAI confer- ence on artificial intelligence, vol. 32, 2018.
 F. V. FOMIN, P. A. GOLOVACH, AND J. KRATOCHVÍL, On tractability of cops and robbers game, in Fifth Ifip International Conference On Theoretical Computer Science – Tcs 2008, G. Ausiello, J. Karhumäki, G. Mauri, and L. Ong, eds., Boston, MA, 2008, Springer US, pp. 171–185.
 M. R. GAREY AND D. S. JOHNSON, Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman, USA, 1979.
 A. GLEAVE, M. DENNIS, C. WILD, N. KANT, S. LEVINE, AND S. RUSSELL, Adversarial policies: Attacking deep reinforcement learning, arXiv preprint arXiv:1905.10615, (2019).
 I. GOODFELLOW, J. POUGET-ABADIE, M. MIRZA, B. XU, D. WARDE-FARLEY, S. OZAIR, A. COURVILLE, AND Y. BENGIO, Generative adversarial nets, in Ad- vances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, eds., vol. 27, Curran Associates, Inc., 2014.
 Y. GUAN, D. MAITY, C. M. KRONINGER, AND P. TSIOTRAS, Bounded-rational pursuit-evasion games, in 2021 American Control Conference (ACC), 2021, pp. 3216–3221.
 J. K. GUPTA, M. EGOROV, AND M. KOCHENDERFER, Cooperative multi-agent control using deep reinforcement learning, in Autonomous Agents and Multia- gent Systems, G. Sukthankar and J. A. Rodriguez-Aguilar, eds., Cham, 2017, Springer International Publishing, pp. 66–83.
 T. HAYNES AND S. SEN, Evolving behavioral strategies in predators and prey, in Adaption and Learning in Multi-Agent Systems, G. Weiß and S. Sen, eds., Berlin, Heidelberg, 1996, Springer Berlin Heidelberg, pp. 113–126.
 , Learning cases to resolve conflicts and improve group behavior, International Journal of Human-Computer Studies, 48 (1998), pp. 31–49.
 T. HAYNES, R. L. WAINWRIGHT, AND S. SEN, Evolving cooperation strategies., in ICMAS, 1995, p. 450.
 T. HAYNES, R. L. WAINWRIGHT, S. SEN, AND D. A. SCHOENEFELD, Strongly typed genetic programming in evolving cooperation strategies., in ICGA, vol. 95, 1995, pp. 271–278.
 R. ISAACS, Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization, New York: John Wiley and Sons, 1965.
 Y. ISHIWAKA, T. SATO, AND Y. KAKAZU, An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning, Robotics and Autonomous Systems, 43 (2003), pp. 245 – 256.
 M. JADERBERG, W. M. CZARNECKI, I. DUNNING, L. MARRIS, G. LEVER, A. G. CASTANEDA, C. BEATTIE, N. C. RABINOWITZ, A. S. MORCOS, A. RUDERMAN, ET AL., Human-level performance in 3d multiplayer games with population- based reinforcement learning, Science, 364 (2019), pp. 859–865.
 J. JIANG, C. DUN, T. HUANG, AND Z. LU, Graph convolutional reinforcement learning, in International Conference on Learning Representations, 2019.
 V. KONDA AND J. TSITSIKLIS, Actor-critic algorithms, Advances in neural infor- mation processing systems, 12 (1999).
 R. E. KORF, A simple solution to pursuit games, in Working Papers of The 11th International Workshop on Distributed Artificial Intelligence, 1992, pp. 183– 194.
 J. R. KOZA, Genetic programming, MIT Press, Cambridge, MA, 1992.
 J. Z. LEIBO, E. HUGHES, M. LANCTOT, AND T. GRAEPEL, Autocurricula and the emergence of innovation from social interaction: A manifesto for multi-agent intelligence research, arXiv preprint arXiv:1903.00742, (2019).
 R. LEVY AND J. S. ROSENSCHEIN, A game theoretic approach to distributed artificial intelligence and the pursuit problem, ACM SIGOIS Bulletin, 13 (1992), p. 11.
 J. E. LITTLEWOOD, A mathematician’s miscellany, Methuen & Co. Ltd., London, 1953.
 R. LOWE, Y. WU, A. TAMAR, J. HARB, P. ABBEEL, AND I. MORDATCH, Multi-agent actor-critic for mixed cooperative-competitive environments, Neural Information Processing Systems (NIPS), (2017).
 X. MA, K. DRIGGS-CAMPBELL, AND M. J. KOCHENDERFER, Improved robust- ness and safety for autonomous vehicle control with adversarial reinforcement learning, in 2018 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2018, pp. 1665–1671.
 P. MAES, M. J. MATARIC, J.-A. MEYER, J. POLLACK, AND S. W. WILSON, Co- evolution of pursuit and evasion ii: Simulation methods and results, (1996).
 T. MAVRIDOU, P. PARDALOS, L. PITSOULIS, AND M. G. RESENDE, A grasp for the biquadratic assignment problem, European Journal of Operational Research, 105 (1998), pp. 613 – 621.
 G. F. MILLER AND D. CLIFF, Co-evolution of pursuit and evasion I: Biological and game-theoretic foundations, School of Cognitive and Computing Sciences, University of Sussex Brighton, 1994.
 , Protean behavior in dynamic games: Arguments for the co-evolution of pursuit- evasion tactics, From animals to animats, 3 (1994), pp. 411–420.
 D. J. MONTANA, Strongly typed genetic programming, Evolutionary computation, 3 (1995), pp. 199–230.
 K. H. W. MYKEL J. KOCHENDERFER, TIM A. WHEELER, Algorithms for Decision Making, MIT Press, 2022.
 G. NITSCHKE, Co-evolution of cooperation in a pursuit evasion game, in Pro- ceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), vol. 2, IEEE, 2003, pp. 2037–2042.
 S. NOLFI AND D. FLOREANO, How co-evolution can enhance the adaptive power of artificial evolution: Implications for evolutionary robotics, in Evolutionary Robotics: First European Workshop, EvoRobot98 Paris, France, April 16–17, 1998 Proceedings 1, Springer, 1998, pp. 22–38.
 R. NOWAKOWSKI AND P. WINKLER, Vertex-to-vertex pursuit in a graph, Discrete Mathematics, 43 (1983), pp. 235 – 239.
 A. OROOJLOOY AND D. HAJINEZHAD, A review of cooperative multi-agent deep reinforcement learning, Applied Intelligence, (2022), pp. 1–46.
 E. OSAWA, A metalevel coordination strategy for reactive cooperative planning., in ICMAS, vol. 95, 1995, pp. 297–303.
 X. PAN, D. SEITA, Y. GAO, AND J. CANNY, Risk averse robust adversarial reinforce- ment learning, in 2019 International Conference on Robotics and Automation (ICRA), IEEE, 2019, pp. 8522–8528.
 C. H. PAPADIMITRIOU AND J. N. TSITSIKLIS, The complexity of markov decision processes, Mathematics of operations research, 12 (1987), pp. 441–450.
 P. M. PARDALOS AND L. S. PITSOULIS, Nonlinear assignment problems: algo- rithms and applications, vol. 7, Springer Science & Business Media, 2013.
 T. D. PARSONS, Pursuit-evasion in a graph, in Theory and Applications of Graphs, Berlin, Heidelberg, 1978, Springer Berlin Heidelberg, pp. 426–441.
 L. PINTO, J. DAVIDSON, R. SUKTHANKAR, AND A. GUPTA, Robust adversarial reinforcement learning, in International Conference on Machine Learning, PMLR, 2017, pp. 2817–2826.
 T. PITCHER, A. MAGURRAN, AND I. WINFIELD, Fish in larger shoals find food faster, Behavioral Ecology and Sociobiology, 10 (1982), pp. 149–151.
 A. QUILLIOT, Jeux et pointes fixes sur les graphes, PhD thesis, Université de Paris VI, 1978.
 T. RASHID, M. SAMVELYAN, C. SCHROEDER, G. FARQUHAR, J. FOERSTER, AND S. WHITESON, Qmix: Monotonic value function factorisation for deep multi- agent reinforcement learning, in International Conference on Machine Learning, PMLR, 2018, pp. 4295–4304.
 C. W. REYNOLDS, Flocks, herds and schools: A distributed behavioral model, in Pro- ceedings of the 14th annual conference on Computer graphics and interactive techniques, 1987, pp. 25–34.
 , Competition, coevolution and the game of tag, in Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, 1994, pp. 59–69.
 S. RUSSELL AND P. NORVIG, Artificial Intelligence: A Modern Approach, 4th Edition, Pearson Education, 2021.
 M. SAMVELYAN, T. RASHID, C. SCHROEDER DE WITT, G. FARQUHAR, N. NARDELLI, T. G. RUDNER, C.-M. HUNG, P. H. TORR, J. FOERSTER, AND S. WHITESON, The starcraft multi-agent challenge, in Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019, pp. 2186–2188.
 J. SCHULMAN, P. MORITZ, S. LEVINE, M. JORDAN, AND P. ABBEEL, High- dimensional continuous control using generalized advantage estimation, arXiv preprint arXiv:1506.02438, (2015).
 J. SCHULMAN, F. WOLSKI, P. DHARIWAL, A. RADFORD, AND O. KLIMOV, Proxi- mal policy optimization algorithms, arXiv preprint arXiv:1707.06347, (2017).
 S. SEUKEN AND S. ZILBERSTEIN, Formal models and algorithms for decentralized control of multiple agents, tech. rep., Technical Report 05-68, Department of Computer Science, University of Massachusetts Amherst, 2005.
 Y. SHI AND R. EBERHART, A modified particle swarm optimizer, in 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360), May 1998, pp. 69–73.
 L. STEPHENS, Agent organization as an effector of dai system performance, in Proceedings of the 9th Workshop on Distributed Artificial Intelligence, 1989, 1989.
 L. M. STEPHENS AND M. B. MERX, The effect of agent control strategy on the performance of a dai pursuit problem, in Proceedings of the 10th International Workshop on Distributed Artificial Intelligence, 1990.
 P. STONE AND M. VELOSO, Multiagent systems: A survey from a machine learning perspective, Autonomous Robots, 8 (2000), pp. 345–383.
 S. SUKHBAATAR, Z. LIN, I. KOSTRIKOV, G. SYNNAEVE, A. SZLAM, AND R. FER- GUS, Intrinsic motivation and automatic curricula via asymmetric self-play, arXiv preprint arXiv:1703.05407, (2017).
 L. SUN, Y.-C. CHANG, C. LYU, Y. SHI, Y. SHI, AND C.-T. LIN, Toward multi-target self-organizing pursuit in a partially observable markov game, arXiv preprint arXiv:2206.12330, (2022).
 L. SUN, C. LYU, AND Y. SHI, Cooperative coevolution of real predator robots and virtual robots in the pursuit domain, Applied Soft Computing, 89 (2020), p. 106098.
 L. SUN, C. LYU, Y. SHI, AND C.-T. LIN, Multiple-preys pursuit based on bi- quadratic assignment problem, in 2021 IEEE Congress on Evolutionary Com- putation (CEC), 2021, pp. 1585–1592.
 R. S. SUTTON AND A. G. BARTO, Reinforcement learning: An introduction, MIT press, 2018.
 M. TAN, Multi-agent reinforcement learning: Independent vs. cooperative agents, in Proceedings of the tenth international conference on machine learning, 1993, pp. 330–337.
 X. TANG, D. YE, L. HUANG, Z. SUN, AND J. SUN, Pursuit-evasion game switching strategies for spacecraft with incomplete-information, Aerospace Science and Technology, 119 (2021), p. 107112.
 J. TERRY, B. BLACK, N. GRAMMEL, M. JAYAKUMAR, A. HARI, R. SULLIVAN, L. S. SANTOS, C. DIEFFENDAHL, C. HORSCH, R. PEREZ-VICENTE, ET AL., Pettingzoo: Gym for multi-agent reinforcement learning, Advances in Neural Information Processing Systems, 34 (2021), pp. 15032–15043.
 J. K. TERRY, B. BLACK, M. JAYAKUMAR, A. HARI, R. SULLIVAN, L. SAN- TOS, C. DIEFFENDAHL, N. L. WILLIAMS, Y. LOKESH, C. HORSCH, ET AL., Pettingzoo: Gym for multi-agent reinforcement learning, arXiv preprint arXiv:2009.14471, (2020).
 J. K. TERRY, N. GRAMMEL, A. HARI, L. SANTOS, AND B. BLACK, Revisiting parameter sharing in multi-agent deep reinforcement learning, arXiv preprint arXiv:2005.13625, (2020).
 C. UNDEGER AND F. POLAT, Multi-agent real-time pursuit, Autonomous Agents and Multi-Agent Systems, 21 (2010), pp. 69–107.
 O. VINYALS, I. BABUSCHKIN, W. M. CZARNECKI, M. MATHIEU, A. DUDZIK, J. CHUNG, D. H. CHOI, R. POWELL, T. EWALDS, P. GEORGIEV, ET AL., Grand- master level in starcraft ii using multi-agent reinforcement learning, Nature, 575 (2019), pp. 350–354.
 X. WANG, Y. CHEN, AND W. ZHU, A survey on curriculum learning, IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 44 (2021), pp. 4555–4576.
 Z. WANG, B. GONG, Y. YUAN, AND X. DING, Incomplete information pursuit- evasion game control for a space non-cooperative target, Aerospace, 8 (2021).
 C. WU, A. R. KREIDIEH, K. PARVATE, E. VINITSKY, AND A. M. BAYEN, Flow: A modular learning framework for mixed autonomy traffic, IEEE Transactions on Robotics, 38 (2021), pp. 1270–1286.
 Y. YANG, R. LUO, M. LI, M. ZHOU, W. ZHANG, AND J. WANG, Mean field multi- agent reinforcement learning, in International conference on machine learning, PMLR, 2018, pp. 5571–5580.
 D. YE, M. SHI, AND Z. SUN, Satellite proximate pursuit-evasion game with dif- ferent thrust configurations, Aerospace Science and Technology, 99 (2020), p. 105715.
 C. YU, A. VELU, E. VINITSKY, J. GAO, Y. WANG, A. BAYEN, AND Y. WU, The surprising effectiveness of ppo in cooperative multi-agent games, Advances in Neural Information Processing Systems, 35 (2022), pp. 24611–24624.
 K. ZHANG, Z. YANG, AND T. BAS ̧ AR, Multi-agent reinforcement learning: A selective overview of theories and algorithms, Handbook of reinforcement learning and control, (2021), pp. 321–384.
 L. ZHENG, J. YANG, H. CAI, M. ZHOU, W. ZHANG, J. WANG, AND Y. YU, Magent: A many-agent reinforcement learning platform for artificial collective intelligence, Proceedings of the AAAI Conference on Artificial Intelligence, 32 (2018).
 Z. ZHOU AND H. XU, Decentralized optimal large scale multi-player pursuit- evasion strategies: A mean field game approach with reinforcement learning, Neurocomputing, (2021).