- Professor Jakob Foerster from the University of Oxford receives an award from the GoodAI Grant program for his research on scalable simulation environments that support the development of cumulative culture.
- His grant project investigates the minimal environment complexity in which the interaction of multiple learning agents leads to the emergence of cumulative culture, and open-ended skill discovery.
- The environment should enable improved interpretability to guide the development of Badger-like systems with zero-shot generalization capabilities.
Reinforcement learning (RL), which tackles sequential decision making, enables a computer system to learn how to make choices by being rewarded for its successes. While systems like AlphaGo and AlphaZero have showcased RL’s immense potential, few RL methods address large-scale multi-agent problems.
Most existing approaches struggle in multi-agent settings since they assume that other agents are an uncontrollable part of the environment, rather than entities with rich internal structures that can be reasoned and communicated with. Foerster’s past work investigates how cooperating agents learn and communicate. More recently, he has worked on the zero-shot coordination problem setting in cooperative game playing .
The grant project proposes to further this research direction by developing an environment capable of facilitating turing-complete computation. As Foerster notes, current limitations of existing artificial environments impede complex multi-agent interactions. While exceptions can be found in games such as Space Engineers and Minecraft, these games have not been designed to be used as reinforcement learning environments, nor with large-scale computational efficiency in mind.
Complexity through simplicity
The implementation of a basic Minimum Viable Environment  — a simple, but open-ended, environment with multiple agents – will provide insights into the ways large collections of agents can diversify their behaviors through their interactions with each other and the environment.
Together with Ph.D student Chris Lu, they aim to demonstrate the emergence of complex multi-agent interactions given extremely fundamental and simple single-agent actions. Chris Lu has been active in creating advanced simulations for AI, specifically, building environments that demonstrate this type of compositional emergence in multi-agent reinforcement learning .
Open-ended environments are extremely valuable in AI research, because in theory, they can provide tasks of increasing and relevant complexity to the agents that populate them. If such agents are capable of improvement, the environment could automatically provide new, more complex tasks to the agents as they evolve and mature, ad infinitum.
The Badger architecture is based on the premise that modularity could allow for generalization beyond the level seen in existing methods. Foerster’s project reinforces the utility and efficacy of Badger architecture through the introduction of novel problems that require such compositionality for solving.
The eventual aim of the environment and the project is to explore the viability and importance of cultural evolution in the development of artificial intelligence (AI) systems. Many current AI systems are still developed in isolation, yet natural intelligence is inherently collective, with culture being the ultimate bootstrapping mechanism that got humankind to where it is today.
To date, most of what we consider general AI research is done in academia and inside big corporations. GoodAI strives to combine the best of both cultures, academic rigor and fast-paced innovation. We aim to create the right conditions to collaborate and cooperate across boundaries. Our goal is to accelerate the progress towards general AI in a safe manner by putting emphasis on community-driven research.
For the latest from our blog sign up for our newsletter.
 Hu, H; Lerer, A; Peysakhovich, P; Foerster, J. 2021. “Other-Play” for Zero-Shot Coordination; arXiv:2003.02979v3.
 M. Rosa et al., “BADGER: Learning to (Learn [Learning Algorithms] through Multi-Agent Communication),” Dec. 2019, [Online]. Available: https://arxiv.org/abs/1912.01513v1
 Pathak, D., Lu, C., Darrell, T., Isola, P., & Efros, A. A. (2019). Learning to control self-assembling morphologies: a study of generalization via modularity. Advances in Neural Information Processing Systems, 32.