Internal Badger Workshop – Summary

May 07, 2020

We recently organized an internal workshop with a number of external collaborators to advance the progress of various challenging topics related to the Badger architecture. In this post, we would like to share the posed questions and the outcomes of the nine intensive sessions. Despite the workshop’s virtual nature, it was a very productive three days, with three one hour sessions per day and participants spanning three different time-zones.

The identified questions and refined topics will continue to be discussed in a follow-up workshop in the next few weeks. This is a private and closed event, but if the topics are of interest to you, do not hesitate to contact us.

Workshop Topics

Each of the nine sessions posed interesting questions and after each discourse, we have identified a number of areas to focus on and further questions to delve into. Both, the questions and the outcomes can be found below and in more detailed notes linked in each of the corresponding session sections below. The notes have a lot of valuable information, however, they are in note form and not polished. 

Detailed notes from sessions

Topics & Areas of Focus

Day 1

  • Types of Inner Loop Learning 
  • Targeting Communication
  • On the Importance of Topologies

Day 2

  • Targeting Scalability
  • Psychology of Badger
  • Dreaming and Deliberating in Badger 

Day 3

  • Principia Badgerica
  • Economies of Badger
  • Conclusions & Future Directions

Workshop Outcomes

Each of the sessions resulted in new questions as well as some suggestions on how to start tackling some of the areas of interest. These are all described in the next section as they are too numerous to list here. Below we outline two of the primary topics that were of most interest to the workshop participants, together with newly raised points that will be the topic of the follow-up workshop.

1. When is modularity, collectivity & multi-agentness beneficial?

  • Where is the transition from monolithic systems into modular/multi-agent ones?
  • What are the benefits of distributedness generally and in the Badger architecture?
  • How can we achieve the benefits of collective decision-making akin the ones observed in the NASA experiment (Watson and Hall 1970)?
  • Why does communication/information transfer work well in neural networks, but not necessarily in Badger? 
  • How can hidden information games help?

2. Something out of nothing?

  • Internal thought process (computation), deliberation, feedback and System 2
  • Why are we able to do it?
  • How to correspondingly discover new knowledge and algorithms in the inner loop?
  • How does it relate to open-endedness, curiosity, generative processes, etc.?
  • The isolated AI scientist / mathematician analogy
  • Inference vs Learning of new processes

These topics are not only the cornerstone of the Badger architecture, but also and more generally, areas of wider research interest that as of yet have not necessarily been sufficiently explored by the machine learning and AI community, or have been forgotten and are only beginning to come under the spotlight within the community.

In the follow-up workshop, we will focus on these topics and questions and we would like to end up with concrete testable hypotheses, tasks, experimental designs and architectures that would allow for actually showing, proving or in other ways confirming or rejecting our beliefs, wishes and concerns about these topics as much as possible.

Posed Questions and Outcomes of Discourse

Session 1 – Types of Inner Loop Learning


The first session looked at the type of learning that happens in Badger in each of the two learning loops and tried to investigate how to gain control over what skills, behaviour and other structures get learned in which loop. Currently, the outer loop is a ‘bottleneck’ that determines the asymptotic performance of Badger. We would like to move this to the inner loop, such that learning on a few steps during outer loop training, inner loop would be able to run for significantly longer number of steps.


  1. In systems with more than one loop of training, what gets learned in which loop?
  2. How do we control what gets learned where?
  3. How do we ensure that only general skills are learned in the outer loop? Why not memorize them?
  4. Where are we waiting for learning to complete? Currently, if we extend the Outer Loop, things improve, but if we extend the Inner Loop they don’t.

Discussion Summary

  • One approach would be to ask the Outer Loop to do no more than to build a hollow brain (i.e. architecture search) – force it to only be able to decide some very indirect aspects of the solution which doesn’t make contact with raw data. This could force most of the learning to take place during the inner loop, in principle.
  • The important thing seems to be forbidding the Outer Loop from touching any parameters that directly interact with the task data, akin to MetaGenRL’s approach
  • Locality of inner loop steps – what if we limit the number of inner loop steps per expert but compose longer inner loops from shorter inner loops by different experts? Also achievable by inter-expert knowledge transfer?
  • Inner loop strategies in meta-learning, in general, are more akin to inference with learned priors. How do we learn more ‘interesting’ sequential strategies in the inner loop?


Session 2 – Targeting Communication


This session tried to shed some light on particular requirements on the communication between Badger experts. If we clarify and specify more precisely what the role of communication in Badger should be, we should be able to derive better architectural biases and incorporate them into our experiments in order to get closer to the desired properties of Badger networks.


  1. What is the role of communication in Badger-like networks?
  2. Do we have any example of what the communication should do?
  3. What should be its properties?
  4. How do our tasks encourage the use of communication?

Discussion Summary

  • Potential roles of communication should:
    • allow for dynamic scalability
      • theoretical: adding more Experts helps with performance
      • practical: distributed coordination, no centralized point
    • enable Badger to solve tasks that are unsolvable by standard DNNs
      • e.g. inputs/outputs of variable dimensions
    • serve as a bottleneck in the system:
      • communication should be general: e.g. experts should be sharing (learning) algorithms rather than specific task solutions (?)
      • Can we encourage the formation of a domain specific language for the experts to use to transfer these? See also – Nicholas’ skill transfer experiments.
  • What should the communication do?
    • Different types of communication, we probably need to support all of them:
      • coordination
      • control
      • information exchange
  • Our current tasks don’t rely enough on communication (i.e. they can be solved without communication between experts by a handwritten policy)
    • But what should happen is that communication speeds up the convergence process.
      • Ablation study for this?
    • How the tasks that test for communication better should look like?


Session 3 – On the Importance of Topologies


The Badger architecture assumes a network of policy-sharing experts that are able to adapt to and eventually solve many problems at hand. We can’t reasonably expect the network of badger experts to be without structure (e.g. due to scaling reasons). It is not unreasonable therefore to expect that a successful topology will comprise a number of densely connected clusters of experts, with sparse connectivity between them. A further assumption might be that this connectivity pattern is hierarchical and scale-free. Assuming this is indeed the case, there were two principal questions posed during this session.


  1. Assume limited experts that alone cannot solve a problem, but after connecting to other experts, they can. What are the key factors that allow them to solve more? (The explanation should be independent of the complexity of the expert: for most problems, it holds that one computational unit will solve less or in longer time than two units, regardless of the unit’s complexity.)
  2. What is the right (scale-free) interface for communication between experts/clusters?

Discussion Summary

  • What gives rise to intelligence:
    • Additional experts have additional information necessary to solve the problem (e.g. a XOR task on a vector – a missing input from any single expert degrades the performance of others to random).
    • Additional time to compute – with too little time for computation (or depth of the network), successful computation may not be possible and the network is forced to be “dumb” despite even theoretically optimal efforts.
    • Creativity / deliberation – considering counterfactuals and different possible futures may be the defining factor of “consciousness” for some, as well as a marker of intelligence. It scales well – in parallel, many different futures and counterfactuals can be considered. No input data is necessary – deliberation can progress completely within one’s mind.


  • Inspiration for connectivity between experts – The Internet:
    • Local broadcasts are possible and cheap, while at the same time any unit can, although with some effort, directly connect to any other unit
    • Requires dynamic routing. This will be challenging in the learning setting.
    • Inspiration for dynamic routing: mesh networks
    • Routing protocol has a built-in robustness, but is hard to learn. If we want an internet-like connectivity, we may need to hardcode the topology.


Session 4 – Targeting Scalability


This session focused on techniques and tasks upon which we can build scalable agents. There are multiple types of scalability, each with their own challenges and potential benefits. Badger agents are specifically aimed to be developed to be scalable in multitudes of scalability dimensions.


  1. Problem scalability – what techniques can we look into for training for this directly?
  2. Computational scalability – As each expert is a computational resource, how can we effectively train them so that they can be distributed over a large (physical or virtual) area, can we train badger to handle the information flow of a large system?
  3. For both: What kinds of tasks showcase these scalability types? We want examples of tasks where one expert can solve it, but 100 are, for example, much faster or arrive at a more interesting solution/strategy.
  4. ‘Symbiotic’ experts – how do we train things which improve architectures they’re added to?

Discussion Summary

  • There are more types of scalability than just adding more experts and larger problems. Such as scalability over time, where experts accumulate experiences and apply them.
  • This is a holistic kind of scalability with deep ties to the inner-outer loop, and dreaming and deliberating workshop topics.
  • A distributed holographic memory or database could be an interesting task in which to see reliability at scale.
  • For the scalability of experts in guessing game – Hypernetworks could be an avenue of investigation.
  • Antifragility rather than just robustness is worth thinking about where the agent considers its failings during a shock and works out how to mitigate further shocks.
  • How we can think formally about scalability?
  • Links to renormalization group – whenever we train a model, we’re learning some approximation local to the dataset (e.g. we only evaluate the loss at certain ‘points’ – # of experts, length of rollout, etc.). If we have some variable which we’re moving far outside of the training distribution, some ways of parameterizing the approximation may have errors which are arbitrarily small within the training region, but which grow without bound with respect to those variables.
  • Can we find bounds to guarantee we don’t have any of those divergent terms? One practical example of this is the difference between something like an inference/triangulation strategy for the guessing game, versus a method which works more akin to gradient descent.


Session 5 – Psychology of Badger


This session was all about the benefits and drawbacks of collective vs. solitary learning & computation. We see and argue a great many benefits in learning and computing collectively, but this is not necessarily backed up yet by results or theories and formulations that could guide us on how to achieve those in practice in Badger. How can we correct this?


  1. What is the true benefit of the collective nature (multi-agentness) of learning and computation?
  2. How and where to show that the collective learning and computation within our agents brings some benefits over learning and computation in a single monolithic agent
  3. What kind of learning procedures and loss functions would foster and encourage learning to exploit the multi-agent and collective nature of badger agents?
  4. Collective learning vs. collective computation
  5. How does a Badger agent/expert differ from an artificial neuron/architecture?
  6. Can Psychology and Social Psychology help and inspire?

Discussion Summary

  • How can we achieve benefits of collective decision-making akin to the ones observed in the NASA survival experiment of Hall and Watson?
    • There is an explicit deliberation, refinement stage and feedback loop that is different from ensembling and diversification
  • Where is the transition from a monolithic into a modular/multi-agent system?
  • In order to be able to experimentally compare Badger agents with deep learning baselines, we need to be able to compare them at a level playing field
    • Mapping Badger onto existing deep learning architectures could allow this
  • There might be a benefit in setting up the learning procedure as a GAN-like game
  • Relevance of novelty search, ensembles, and niches
  • It is difficult to constrain what is and what is not learned. How can we control this?
  • What is the difference between multi-agentness, distributedness, collectivity and modularity in this setting and how are their differences relevant?


Session 6 – Dreaming and Deliberating in Badger


What would it take to have an architecture that benefits from being able to sit and think, make a plan for how to solve a task, etc.? What do inputs and outputs look like in this setting? What kind of internal structure? What is the ‘outer loop policy’ here?


  1. What are the unifying aspects of  cognitive circumstances in which having extra time to think, without receiving more external information during that period, provides a benefit?
  2. Can we integrate those cases into existing formalisms for information processing by considering a case in which ‘thinking’ can be an information generating process rather than information-neutral or filtering?

Discussion Summary

  • There is a recent article on ‘usable’ information which suggests that the information available to part of a cognitive system with respect to some target variable could be increased by processing, if the cognitive system is constrained in the kinds of operations it allows. So in this framework, ‘thinking’ would be like decoding an encrypted message.
  • Is it possible to think of mutual information (MI) with respect to counterfactual variables? E.g. can I create a variable that has MI with ‘what would happen if I did X’ even if I didn’t do X? In that case, the information being generated is essentially virtual (doesn’t refer to anything outside of the system)
  • What’s the information theoretic trace of things like neural architecture search, where the process discovers better architectures over time but doesn’t have any external information input? Might not formally be an increase in information, but it’s a strange case since the prior distribution is malformed, so maybe there’s something mathematically useful there.


Session 7 – Principia Badgerica


This session was about the foundational principles of the Badger architecture, how it evolved over time and what we have learned during Badger research and development thus far. There was a lot of discussion already at the first few principles, namely modularity, collectivity, multi-agentness and meta-reasoning. This meant that there was insufficient time to discuss in detail what we have learned thus far, a topic warranting further discussion and amalgamation into a separate document. 


  1. What are the Badger principles (foundations) that we have discovered already?
  2. What have we learned during Badger development so far?

Discussion Summary

  • Badger streamlined our thinking about AGI agents, it has more structure, individual properties should be more testable, we ask better questions than before-Badger, perhaps even enable incremental R&D
  • One has to be careful in choosing an architectural property that is interesting vs. one that is good. Multi-agentness is interesting, not necessarily good, but very likely necessary for creating a system that goes beyond what is currently achievable
    • Extensibility is a strong benefit of the multi-agent nature of Badger, but we might have to train for it explicitly to reap benefits
    • Modularity helps reduce interference
  • Meta-reasoning is an unexplored area that has links to the dreaming and deliberation stage of badger and the notion of a badger economy
  • Various pressures exist to make use of meta-reasoning and that in turn affect the way the entire system operates and learns
  • Learning invariances in the inner loop might be necessary and potentially exponentially hard
  • There is still a lot of work to be done on finding tasks and architectural properties allowing to show the benefit of the multi-agent nature and communication capabilities of Badger


Session 8 – Economies of Badger


Economics in Badger is the study of information exchange in the network of experts where the information can be interpreted as a flow of credit, from the environment to the computational resources. It is not unreasonable to expect that a complex, collaborative multi-agent system, such as Badger, might eventually benefit from, as well as learn to develop, the concept of an economy. In this session, the aim was to discuss the benefits and drawbacks of such an internal system, how it could emerge and what purpose it would have.


  1. Economy as a dissipative system?
  2. What level of economic complexity is appropriate? Neurons in the brain are not very complex but perhaps neural columns could enable more complex economic structures and credit interchange?
  3. Which of these economic phenomena can be expected in future Badger prototypes?
    • Different currencies (different types of credit)
    • Specialization
    • Inflation
    • Monopoly on intellectual property, versus, copy left intellectual property. What would be the impact on an agent’s growth?
    • Auctions
    • Stock markets, money lending, startups, VC funds, Private Equity
  4. How to test for them?
  5. Which phenomena are certainly going to be present (e.g. different currencies) and which are unlikely (e.g. stock market, money lending, money laundering)
  6. Is it even useful for our research to try to seek economical analogies in Badger?
  7. Perhaps external credit and payment for computational resources is not necessary. What if the credit should be an internal property, an estimation of how different experts contribute to the agent’s survival and the payment to the computational resources is just a necessity?
  8. Is Badger an open or closed thermodynamic system?

Discussion Summary

  • Economy shows how multi-agentness can be useful – instead of a single loss, you can have multiple independent (incompatible, not shared) losses
  • Dynamics of the system can be much richer with multiple losses than with a single loss
  • There seems to be a link between the concept of an economy and the meta-reasoning stage in badger, especially when the concept of currency is used for “paying” for compute
  • A global substrate – Economy acting as a link between the local and the global behaviour of the system
  • Economy can also be viewed as a diversifier, avoiding homogeneity and issues associated with it
  • Different reward schemes and strategies will naturally result in very different system behaviour. Which ones are important for Badger, selfish, cooperative, local, global, others?


Session 9 – Conclusions and Future Directions

In this session we have converged on the most interesting outcomes of the foregone discussions, as outlined at the beginning of this post. Possible tangible outcomes were discussed, such as this post, as well as a research roadmap with concrete goals corresponding to questions and topics found most pressing and identified throughout the discourse. Below provide some more detail on these topics.

Discussion Summary

  • When is modularity, collectivity & multi-agentness beneficial?
  • When can we make something out of nothing?
  • Why do people have an internal thought process/deliberation?
  • The concept of an economy within an agent is intriguing
  • An economy is an amplifier of the learning in the underlying system
  • For an economy to work, agents need to understand the benefits of one another
  • How and in what way do agents need to be smart in order to have and benefit from an economy?
  • We should continue to be critical of the multi-agentness and truly discover and exploit its benefits
  • Tasks are very important, especially we want to learn complex behaviour and show the various benefits of the Badger architecture
  • Back propagation always finds the simplest solution – what is the simplest task?
  • Where is the transition between a monolithic system and a multi-agent one?


GoodAI: Olga Afanasjeva, Simon Andersson, Joe Davidson, Jan Feyereisl, Nicholas Guttenberg, Petr Hlubucek, Martin Poliak, Marek Rosa, Jaroslav Vitku

External Collaborators: Kai Arulkumaran (Imperial College London), Martin Biehl (Araya), Kevin Corder (University of Delaware), Guillem Duran Ballester (Fragile Tech), Petr Simanek (DataLab, CVUT), Olaf Witkowski (Cross Labs)


Join GoodAI

Are you keen on making a meaningful impact? Interested in joining the GoodAI team?

View open positions