[1] J. L. BA, J. R. KIROS, AND G. E. HINTON, Layer normalization, arXiv preprint arXiv:1607.06450, (2016).
[2] B. BAKER, I. KANITSCHEIDER, T. MARKOV, Y. WU, G. POWELL, B. MCGREW, AND I. MORDATCH, Emergent tool use from multi-agent autocurricula, in International Conference on Learning Representations, 2019.
[3] N. BALAJI, S. KIEFER, P. NOVOTNY` , G. A. PÉREZ, AND M. SHIRMOHAMMADI, On the complexity of value iteration, arXiv preprint arXiv:1807.04920, (2018).
[4] S. BARRETT, A. ROSENFELD, S. KRAUS, AND P. STONE, Making friends on the fly: Cooperating with new teammates, Artificial Intelligence, 242 (2017), pp. 132 – 171.
[5] S. BARRETT AND P. STONE, An analysis framework for ad hoc teamwork tasks, in Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, AAMAS 12, Richland, SC, 2012, International Foundation for Autonomous Agents and Multiagent Systems, pp. 357–364.
[6] , Cooperating with unknown teammates in complex domains: A robot soccer case study of ad hoc teamwork, in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI 15, AAAI Press, 2015, pp. 2010– 2016.
[7] S. BARRETT, P. STONE, AND S. KRAUS, Empirical evaluation of ad hoc teamwork in the pursuit domain, in The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, AAMAS ’11, Richland, SC, 2011, International Foundation for Autonomous Agents and Multiagent Systems, pp. 567–574.
[8] S. BARRETT, P. STONE, S. KRAUS, AND A. ROSENFELD, Teamwork with lim- ited knowledge of teammates, in Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, AAAI 13, AAAI Press, 2013, pp. 102–108.
[9] M. BENDA, V. JAGANNATHAN, AND R. DODHIAWALA, On optimal cooperation of knowledge sources-an empirical investigation, tech. rep., BCS-G2010-28, Boeing Advanced Technology Center, Boeing Computing Services, Seattle, Washington, 1986.
[10] Y. BENGIO, J. LOURADOUR, R. COLLOBERT, AND J. WESTON, Curriculum learn- ing, in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 41–48.
[11] D. S. BERNSTEIN, R. GIVAN, N. IMMERMAN, AND S. ZILBERSTEIN, The complexity of decentralized control of markov decision processes, Mathematics of operations research, 27 (2002), pp. 819–840.
[12] J. C. BEZDEK, Pattern recognition with fuzzy objective function algorithms, Springer, Boston, MA, 2013.
[13] A. BONATO, The game of cops and robbers on graphs, American Mathematical Soc., 2011.
[14] C. BOUTILIER, Planning, learning and coordination in multiagent decision pro- cesses, in TARK, vol. 96, Citeseer, 1996, pp. 195–210.
[15] , Sequential optimality and coordination in multiagent systems, in IJCAI, vol. 99, 1999, pp. 478–485.
[16] R. BURKARD, M. DELL’AMICO, AND S. MARTELLO, Assignment Problems, Society for Industrial and Applied Mathematics, 2012.
[17] R. E. BURKARD AND E. ÇELA, Heuristics for biquadratic assignment problems and their computational comparison, European Journal of Operational Research, 83 (1995), pp. 283–300.
[18] R. E. BURKARD, E. CELA, AND B. KLINZ, On the biquadratic assignment problem, in Quadratic Assignment and Related Problems: DIMACS Workshop, May 20-21, 1993, vol. 16, American Mathematical Soc., 1994, pp. 117–146.
[19] S. CAMAZINE, J.-L. DENEUBOURG, N. R. FRANKS, J. SNEYD, G. THERAULA, AND E. BONABEAU, Self-organization in biological systems, Princeton university press, 2001.
[20] E. CELA, The quadratic assignment problem: theory and algorithms, vol. 1, Springer Science & Business Media, 2013.
[21] T. H. CHUNG, G. A. HOLLINGER, AND V. ISLER, Search and pursuit-evasion in mobile robotics, Autonomous robots, 31 (2011), p. 299.
[22] M. S. COUCEIRO, R. P. ROCHA, AND N. M. F. FERREIRA, A novel multi-robot exploration approach based on particle swarm optimization algorithms, in 2011 IEEE International Symposium on Safety, Security, and Rescue Robotics, Nov 2011, pp. 327–332.
[23] R. DAWKINS AND J. R. KREBS, Arms races between and within species, Proceedings of the Royal Society of London. Series B. Biological Sciences, 205 (1979), pp. 489– 511.
[24] C. DE SOUZA, R. NEWBURY, A. COSGUN, P. CASTILLO, B. VIDOLOV, AND D. KULIƒÁ, Decentralized multi-agent pursuit using deep reinforcement learn- ing, IEEE Robotics and Automation Letters, 6 (2021), pp. 4552–4559.
[25] S. L. DEVADOSS AND J. O’ROURKE, Discrete and computational geometry, Prince- ton University Press, 2011.
[26] M. EGOROV, Multi-agent deep reinforcement learning, CS231n: Convolutional Neural Networks for Visual Recognition, (2016).
[27] A. EIBEN AND J. SMITH, Introduction to evolutionary computing, Springer, 2015.
[28] J. FOERSTER, G. FARQUHAR, T. AFOURAS, N. NARDELLI, AND S. WHITESON, Counterfactual multi-agent policy gradients, in Proceedings of the AAAI confer- ence on artificial intelligence, vol. 32, 2018.
[29] F. V. FOMIN, P. A. GOLOVACH, AND J. KRATOCHVÍL, On tractability of cops and robbers game, in Fifth Ifip International Conference On Theoretical Computer Science – Tcs 2008, G. Ausiello, J. Karhumäki, G. Mauri, and L. Ong, eds., Boston, MA, 2008, Springer US, pp. 171–185.
[30] M. R. GAREY AND D. S. JOHNSON, Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman, USA, 1979.
[31] A. GLEAVE, M. DENNIS, C. WILD, N. KANT, S. LEVINE, AND S. RUSSELL, Adversarial policies: Attacking deep reinforcement learning, arXiv preprint arXiv:1905.10615, (2019).
[32] I. GOODFELLOW, J. POUGET-ABADIE, M. MIRZA, B. XU, D. WARDE-FARLEY, S. OZAIR, A. COURVILLE, AND Y. BENGIO, Generative adversarial nets, in Ad- vances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, eds., vol. 27, Curran Associates, Inc., 2014.
[33] Y. GUAN, D. MAITY, C. M. KRONINGER, AND P. TSIOTRAS, Bounded-rational pursuit-evasion games, in 2021 American Control Conference (ACC), 2021, pp. 3216–3221.
[34] J. K. GUPTA, M. EGOROV, AND M. KOCHENDERFER, Cooperative multi-agent control using deep reinforcement learning, in Autonomous Agents and Multia- gent Systems, G. Sukthankar and J. A. Rodriguez-Aguilar, eds., Cham, 2017, Springer International Publishing, pp. 66–83.
[35] T. HAYNES AND S. SEN, Evolving behavioral strategies in predators and prey, in Adaption and Learning in Multi-Agent Systems, G. Weiß and S. Sen, eds., Berlin, Heidelberg, 1996, Springer Berlin Heidelberg, pp. 113–126.
[36] , Learning cases to resolve conflicts and improve group behavior, International Journal of Human-Computer Studies, 48 (1998), pp. 31–49.
[37] T. HAYNES, R. L. WAINWRIGHT, AND S. SEN, Evolving cooperation strategies., in ICMAS, 1995, p. 450.
[38] T. HAYNES, R. L. WAINWRIGHT, S. SEN, AND D. A. SCHOENEFELD, Strongly typed genetic programming in evolving cooperation strategies., in ICGA, vol. 95, 1995, pp. 271–278.
[39] R. ISAACS, Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization, New York: John Wiley and Sons, 1965.
[40] Y. ISHIWAKA, T. SATO, AND Y. KAKAZU, An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning, Robotics and Autonomous Systems, 43 (2003), pp. 245 – 256.
[41] M. JADERBERG, W. M. CZARNECKI, I. DUNNING, L. MARRIS, G. LEVER, A. G. CASTANEDA, C. BEATTIE, N. C. RABINOWITZ, A. S. MORCOS, A. RUDERMAN, ET AL., Human-level performance in 3d multiplayer games with population- based reinforcement learning, Science, 364 (2019), pp. 859–865.
[42] J. JIANG, C. DUN, T. HUANG, AND Z. LU, Graph convolutional reinforcement learning, in International Conference on Learning Representations, 2019.
[43] V. KONDA AND J. TSITSIKLIS, Actor-critic algorithms, Advances in neural infor- mation processing systems, 12 (1999).
[44] R. E. KORF, A simple solution to pursuit games, in Working Papers of The 11th International Workshop on Distributed Artificial Intelligence, 1992, pp. 183– 194.
[45] J. R. KOZA, Genetic programming, MIT Press, Cambridge, MA, 1992.
[46] J. Z. LEIBO, E. HUGHES, M. LANCTOT, AND T. GRAEPEL, Autocurricula and the emergence of innovation from social interaction: A manifesto for multi-agent intelligence research, arXiv preprint arXiv:1903.00742, (2019).
[47] R. LEVY AND J. S. ROSENSCHEIN, A game theoretic approach to distributed artificial intelligence and the pursuit problem, ACM SIGOIS Bulletin, 13 (1992), p. 11.
[48] J. E. LITTLEWOOD, A mathematician’s miscellany, Methuen & Co. Ltd., London, 1953.
[49] R. LOWE, Y. WU, A. TAMAR, J. HARB, P. ABBEEL, AND I. MORDATCH, Multi-agent actor-critic for mixed cooperative-competitive environments, Neural Information Processing Systems (NIPS), (2017).
[50] X. MA, K. DRIGGS-CAMPBELL, AND M. J. KOCHENDERFER, Improved robust- ness and safety for autonomous vehicle control with adversarial reinforcement learning, in 2018 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2018, pp. 1665–1671.
[51] P. MAES, M. J. MATARIC, J.-A. MEYER, J. POLLACK, AND S. W. WILSON, Co- evolution of pursuit and evasion ii: Simulation methods and results, (1996).
[52] T. MAVRIDOU, P. PARDALOS, L. PITSOULIS, AND M. G. RESENDE, A grasp for the biquadratic assignment problem, European Journal of Operational Research, 105 (1998), pp. 613 – 621.
[53] G. F. MILLER AND D. CLIFF, Co-evolution of pursuit and evasion I: Biological and game-theoretic foundations, School of Cognitive and Computing Sciences, University of Sussex Brighton, 1994.
[54] , Protean behavior in dynamic games: Arguments for the co-evolution of pursuit- evasion tactics, From animals to animats, 3 (1994), pp. 411–420.
[55] D. J. MONTANA, Strongly typed genetic programming, Evolutionary computation, 3 (1995), pp. 199–230.
[56] K. H. W. MYKEL J. KOCHENDERFER, TIM A. WHEELER, Algorithms for Decision Making, MIT Press, 2022.
[57] G. NITSCHKE, Co-evolution of cooperation in a pursuit evasion game, in Pro- ceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), vol. 2, IEEE, 2003, pp. 2037–2042.
[58] S. NOLFI AND D. FLOREANO, How co-evolution can enhance the adaptive power of artificial evolution: Implications for evolutionary robotics, in Evolutionary Robotics: First European Workshop, EvoRobot98 Paris, France, April 16–17, 1998 Proceedings 1, Springer, 1998, pp. 22–38.
[59] R. NOWAKOWSKI AND P. WINKLER, Vertex-to-vertex pursuit in a graph, Discrete Mathematics, 43 (1983), pp. 235 – 239.
[60] A. OROOJLOOY AND D. HAJINEZHAD, A review of cooperative multi-agent deep reinforcement learning, Applied Intelligence, (2022), pp. 1–46.
[61] E. OSAWA, A metalevel coordination strategy for reactive cooperative planning., in ICMAS, vol. 95, 1995, pp. 297–303.
[62] X. PAN, D. SEITA, Y. GAO, AND J. CANNY, Risk averse robust adversarial reinforce- ment learning, in 2019 International Conference on Robotics and Automation (ICRA), IEEE, 2019, pp. 8522–8528.
[63] C. H. PAPADIMITRIOU AND J. N. TSITSIKLIS, The complexity of markov decision processes, Mathematics of operations research, 12 (1987), pp. 441–450.
[64] P. M. PARDALOS AND L. S. PITSOULIS, Nonlinear assignment problems: algo- rithms and applications, vol. 7, Springer Science & Business Media, 2013.
[65] T. D. PARSONS, Pursuit-evasion in a graph, in Theory and Applications of Graphs, Berlin, Heidelberg, 1978, Springer Berlin Heidelberg, pp. 426–441.
[66] L. PINTO, J. DAVIDSON, R. SUKTHANKAR, AND A. GUPTA, Robust adversarial reinforcement learning, in International Conference on Machine Learning, PMLR, 2017, pp. 2817–2826.
[67] T. PITCHER, A. MAGURRAN, AND I. WINFIELD, Fish in larger shoals find food faster, Behavioral Ecology and Sociobiology, 10 (1982), pp. 149–151.
[68] A. QUILLIOT, Jeux et pointes fixes sur les graphes, PhD thesis, Université de Paris VI, 1978.
[69] T. RASHID, M. SAMVELYAN, C. SCHROEDER, G. FARQUHAR, J. FOERSTER, AND S. WHITESON, Qmix: Monotonic value function factorisation for deep multi- agent reinforcement learning, in International Conference on Machine Learning, PMLR, 2018, pp. 4295–4304.
[70] C. W. REYNOLDS, Flocks, herds and schools: A distributed behavioral model, in Pro- ceedings of the 14th annual conference on Computer graphics and interactive techniques, 1987, pp. 25–34.
[71] , Competition, coevolution and the game of tag, in Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, 1994, pp. 59–69.
[72] S. RUSSELL AND P. NORVIG, Artificial Intelligence: A Modern Approach, 4th Edition, Pearson Education, 2021.
[73] M. SAMVELYAN, T. RASHID, C. SCHROEDER DE WITT, G. FARQUHAR, N. NARDELLI, T. G. RUDNER, C.-M. HUNG, P. H. TORR, J. FOERSTER, AND S. WHITESON, The starcraft multi-agent challenge, in Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019, pp. 2186–2188.
[74] J. SCHULMAN, P. MORITZ, S. LEVINE, M. JORDAN, AND P. ABBEEL, High- dimensional continuous control using generalized advantage estimation, arXiv preprint arXiv:1506.02438, (2015).
[75] J. SCHULMAN, F. WOLSKI, P. DHARIWAL, A. RADFORD, AND O. KLIMOV, Proxi- mal policy optimization algorithms, arXiv preprint arXiv:1707.06347, (2017).
[76] S. SEUKEN AND S. ZILBERSTEIN, Formal models and algorithms for decentralized control of multiple agents, tech. rep., Technical Report 05-68, Department of Computer Science, University of Massachusetts Amherst, 2005.
[77] Y. SHI AND R. EBERHART, A modified particle swarm optimizer, in 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360), May 1998, pp. 69–73.
[78] L. STEPHENS, Agent organization as an effector of dai system performance, in Proceedings of the 9th Workshop on Distributed Artificial Intelligence, 1989, 1989.
[79] L. M. STEPHENS AND M. B. MERX, The effect of agent control strategy on the performance of a dai pursuit problem, in Proceedings of the 10th International Workshop on Distributed Artificial Intelligence, 1990.
[80] P. STONE AND M. VELOSO, Multiagent systems: A survey from a machine learning perspective, Autonomous Robots, 8 (2000), pp. 345–383.
[81] S. SUKHBAATAR, Z. LIN, I. KOSTRIKOV, G. SYNNAEVE, A. SZLAM, AND R. FER- GUS, Intrinsic motivation and automatic curricula via asymmetric self-play, arXiv preprint arXiv:1703.05407, (2017).
[82] L. SUN, Y.-C. CHANG, C. LYU, Y. SHI, Y. SHI, AND C.-T. LIN, Toward multi-target self-organizing pursuit in a partially observable markov game, arXiv preprint arXiv:2206.12330, (2022).
[83] L. SUN, C. LYU, AND Y. SHI, Cooperative coevolution of real predator robots and virtual robots in the pursuit domain, Applied Soft Computing, 89 (2020), p. 106098.
[84] L. SUN, C. LYU, Y. SHI, AND C.-T. LIN, Multiple-preys pursuit based on bi- quadratic assignment problem, in 2021 IEEE Congress on Evolutionary Com- putation (CEC), 2021, pp. 1585–1592.
[85] R. S. SUTTON AND A. G. BARTO, Reinforcement learning: An introduction, MIT press, 2018.
[86] M. TAN, Multi-agent reinforcement learning: Independent vs. cooperative agents, in Proceedings of the tenth international conference on machine learning, 1993, pp. 330–337.
[87] X. TANG, D. YE, L. HUANG, Z. SUN, AND J. SUN, Pursuit-evasion game switching strategies for spacecraft with incomplete-information, Aerospace Science and Technology, 119 (2021), p. 107112.
[88] J. TERRY, B. BLACK, N. GRAMMEL, M. JAYAKUMAR, A. HARI, R. SULLIVAN, L. S. SANTOS, C. DIEFFENDAHL, C. HORSCH, R. PEREZ-VICENTE, ET AL., Pettingzoo: Gym for multi-agent reinforcement learning, Advances in Neural Information Processing Systems, 34 (2021), pp. 15032–15043.
[89] J. K. TERRY, B. BLACK, M. JAYAKUMAR, A. HARI, R. SULLIVAN, L. SAN- TOS, C. DIEFFENDAHL, N. L. WILLIAMS, Y. LOKESH, C. HORSCH, ET AL., Pettingzoo: Gym for multi-agent reinforcement learning, arXiv preprint arXiv:2009.14471, (2020).
[90] J. K. TERRY, N. GRAMMEL, A. HARI, L. SANTOS, AND B. BLACK, Revisiting parameter sharing in multi-agent deep reinforcement learning, arXiv preprint arXiv:2005.13625, (2020).
[91] C. UNDEGER AND F. POLAT, Multi-agent real-time pursuit, Autonomous Agents and Multi-Agent Systems, 21 (2010), pp. 69–107.
[92] O. VINYALS, I. BABUSCHKIN, W. M. CZARNECKI, M. MATHIEU, A. DUDZIK, J. CHUNG, D. H. CHOI, R. POWELL, T. EWALDS, P. GEORGIEV, ET AL., Grand- master level in starcraft ii using multi-agent reinforcement learning, Nature, 575 (2019), pp. 350–354.
[93] X. WANG, Y. CHEN, AND W. ZHU, A survey on curriculum learning, IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 44 (2021), pp. 4555–4576.
[94] Z. WANG, B. GONG, Y. YUAN, AND X. DING, Incomplete information pursuit- evasion game control for a space non-cooperative target, Aerospace, 8 (2021).
[95] C. WU, A. R. KREIDIEH, K. PARVATE, E. VINITSKY, AND A. M. BAYEN, Flow: A modular learning framework for mixed autonomy traffic, IEEE Transactions on Robotics, 38 (2021), pp. 1270–1286.
[96] Y. YANG, R. LUO, M. LI, M. ZHOU, W. ZHANG, AND J. WANG, Mean field multi- agent reinforcement learning, in International conference on machine learning, PMLR, 2018, pp. 5571–5580.
[97] D. YE, M. SHI, AND Z. SUN, Satellite proximate pursuit-evasion game with dif- ferent thrust configurations, Aerospace Science and Technology, 99 (2020), p. 105715.
[98] C. YU, A. VELU, E. VINITSKY, J. GAO, Y. WANG, A. BAYEN, AND Y. WU, The surprising effectiveness of ppo in cooperative multi-agent games, Advances in Neural Information Processing Systems, 35 (2022), pp. 24611–24624.
[99] K. ZHANG, Z. YANG, AND T. BAS ̧ AR, Multi-agent reinforcement learning: A selective overview of theories and algorithms, Handbook of reinforcement learning and control, (2021), pp. 321–384.
[100] L. ZHENG, J. YANG, H. CAI, M. ZHOU, W. ZHANG, J. WANG, AND Y. YU, Magent: A many-agent reinforcement learning platform for artificial collective intelligence, Proceedings of the AAAI Conference on Artificial Intelligence, 32 (2018).
[101] Z. ZHOU AND H. XU, Decentralized optimal large scale multi-player pursuit- evasion strategies: A mean field game approach with reinforcement learning, Neurocomputing, (2021).
Edit Comment