Chrome Extension
WeChat Mini Program
Use on ChatGLM

Pommerman & NeurIPS 2018 Multi-Agent Competition

Springer Series on Challenges in Machine Learning(2020)

Cited 0|Views10
No score
Abstract
Pommerman is an exciting new environment for multi-agent research based on the classic game Bomberman. This publication covers its inaugural NeurIPS competition (and second overall), held at NeurIPS 2018, and featuring the 2v2 team environment. In the first chapter, the first section familiarizes the audience with the game and its nuances, and the second section describes the competition and the results. In the remaining chapters, we then move on to the competitors' descriptions in order of competition result. Chapters two and four describe two agents made by colleagues at IBM. Chapter three's dynamic Pommerman (dypm) agent is a particular implementation of realtime tree search with pessimistic scenarios, where standard tree search is limited to a specified depth, but each leaf is evaluated under a deterministic and pessimistic scenario. The evaluation with the deterministic scenario does not involve branching, contrary to the standard tree search, and can efficiently take into account significant events that the agent can encounter far ahead in the future. The pessimistic scenario is generated by assuming super strong enemies, and the level of pessimism is tuned via self-play. Using these techniques, the dypm agent can meet the real-time constraint when it is implemented with Python. Chapter one's agent was similar to this, but uses a real-time search tree to evaluate moves. It is then followed by self-play for tuning. Chapter three's Eisenach agent was second at the Pommerman Team Competition, matching the performance of its predecessor on the earlier free-for-all competition. The chosen framework was online mini-max tree search with a quick C++ simulator, which enabled deeper search within the allowed 0.1 s. Several tactics were successfully applied to lower the amount of ties and avoid repeating situations. These helped to make games even more dense and exciting, while increasing the measured difference between agents. Bayes-based cost-optimization was applied, however it didn't prove useful. The resulting agent passed the first 3 rounds at the competition without any tie or defeat and could even win against the overall winner in some of the matches. Chapter five featured the Navocado agent. It was trained using AdvantageActor-Critic (A2C) algorithm and guided by the Continual Match Based Training (COMBAT) framework. This agent first transformed the original continuous state representations into discrete state representations. This made it easier for the deep model to learn. Then, a new action space was proposed that allowed it to use its proposed destination as an action, enabling longer-term planning. Finally, the COMBAT framework allowed it to define adaptive rewards in different game stages. The Navocado agent was the top learning agent in the competition. Finally, chapter six featured the nn_team_skynet955_skynet955 agent, which ranked second place in the learning agents category and fifth place overall. Equipped with an automatic module for action pruning, this agent was directly trained by end-to-end deep reinforcement learning in the partially observable team environment against a curriculum of opponents together with reward shaping. A single trained neural net model was selected to form a team for participating in the competition. This chapter discusses the difficulty of Pommerman as a benchmark for model-free reinforcement learning and describes the core elements upon which the agent was built.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined