Can we teach machines virtues?

This blog post is based on a workshop with the GoodAI team and Eric Salobir — Blackfriar, Roman Catholic priest, and President of Optic Technology.

There has been a lot written about human-level artificial intelligence and how we can ensure that it is “aligned” with human values so that it is safe for humanity [1] [2].

In this article, we explore some of the characteristics of morals and suggest that teaching AI virtues might be a good way to create a safe AI which respects cross-cultural differences and has a robust value system over time.

The problems with morals

The ability to make moral decisions is often seen as a key feature of human-level intelligence, however just because a machine reaches human-level intelligence it does not mean that it will automatically be capable of good moral judgment, as outlined in the orthogonality thesis [3]. Determining morals and instilling them into an agent can be a problematic task for a number of reasons.

1. Morals evolve over time

Morals are fluid and constantly changing over time, they have been built up over many years, reflecting an accumulation of human cultures. Many things that were once considered morally acceptable no longer are, and things that we currently may consider morally acceptable are bound to change in the future. Therefore, an AI agent cannot have a set of morals hardcoded into it, it needs to have the ability to evolve its morals over time, much like humans do.

2. Morals differ across cultures

Morals are also not fully shared across cultures. In a recent paper called The Moral Machine experiment, Edmond Awad et al. demonstrated the “substantial cultural variations in ethical judgments,” when they ran an experiment involving millions of people in 233 countries and territories [4]. This makes it very difficult to define one set of morals which would be valid across different cultures. Therefore, an AI agent would also need the ability to adapt according to its cultural surroundings.

3. Morals are difficult to define

Although self-reflection and reasoning are instrumental, humans often operate “morally” at an intuitive level, we do not know why we do certain things (we just see them as right or wrong), therefore it makes morals often almost impossible to explain, let alone translate them into a computational model.

With all these things in mind, it seems clear that we need to make sure that AI agents have the ability to adapt their morals. However, it is also important to ensure that an agent’s morals do not evolve into something completely different, unrecognizable to human morals, which could be dangerous. Below we explore the theme of virtues and look at how they may be used to guide AI on a moral pathway.

Virtues as a way forward

As morals constantly change, simply teaching a list of morals to machines will not be sufficient to keep up with evolving society and culture. More effective may be to instill some core virtues which would ensure that the morals stay on the right trajectory with changing times, conditions, requirements, and attitudes. Virtues have stood the test of time and we see throughout history that they have remained more stable than morals. Therefore, they may make a solid foundation for an ever-evolving code for AI to help keep it aligned with human values.

The idea of virtues goes back at least to ancient Greek times and has been adopted by many cultures and religions [5]. Plato and Aristotle agreed on 4 main virtues that they believed allowed individuals to complete their human function [6]:

Prudence: also referred to as practical wisdom, is the ability to act according to one’s experiences using reason and making logical decisions. It acts as a precursor to the other virtues.
Temperance: also known as self-control, or voluntary self-restraint, puts forward the idea of moderation, avoiding potentially damaging excess. (This is vital for the prevention of unintended consequences).
Courage: the ability to be strong and face uncertainty head-on. Also encompasses the freedom to act.
Justice: also called fairness.

Plato and Aristotle believed that these four virtues complement each other and allow individuals to contribute positively to society. These 4 virtues have appeared in various forms throughout history and are advocated in Christian, Buddhist, Hindu, and Jewish scriptures. They have been developed over time, for example, Christianity explored more deeply the theological virtues of faith, hope, and love, but overall the virtues remain similar as guidelines for a “good human”. Virtues help guide humans when aspiring to improve ourselves. For example, consciously cultivating one’s generosity helps in arriving at the point when one will genuinely feel good about giving and also internalizing related morals.

Teaching virtues to machines

If virtues are the way forward, we need to develop a robust approach to teach these virtues to our AI, potentially through a curriculum on which AI will be trained. This curriculum could be developed by humans, probably with the help of technology, and be designed to teach the AI much like a child would be taught in a human school.

Fairy tales, or children’s stories, are often good representations of virtues held by a given society. People do not need to directly experience wrong behavior and the consequences of it themselves but they can learn through anecdotes or fairy tales.

To figure out the most important virtues, we could use current machine learning techniques to analyze thousands of fairy tales from across the globe and detect patterns to determine what are the most important virtues that run through them all. Creating something similar to William Bennett’s Book of Virtues [7] but on a much larger scale.

These would then be the virtues that the AI is trained on. In order to ensure that it has fully understood the virtues, a potential testing task for the AI could be to ask it to finish an unseen story, and see if it creates an ending which humans would consider morally acceptable.

An important research question is: how can we make sure that virtues stay stable over time? If the virtues instilled on our AI change with time then virtues have no benefit. Therefore, curriculum learning might only be part of the answer, and work needs to be done to figure out the engineering steps needed to make virtues robust to change.

How would it help?

As we mentioned above, instilling virtues into machines may help keep them on a good moral trajectory with changing times, allowing them to adapt their morals as necessary and even improve ours. This would also include enriching the curriculum with new “teaching” stories with moral dimensions which would reflect new developments in human morals. The approach would also allow us to put “society in the loop” [10], i.e. ensuring input from the wider public is taken in by machines.

Improving our morals

A strong AI could also use its knowledge of virtues to help us improve our morals, and even speed up the progression to a better society. If we look at society there are issues which according to virtues may be considered wrong, with increasing potential of technology and understanding of inequality, such as animal factory farming or distribution of suffering. But people often don’t care enough, or have other issues they need to be getting on with. If a general AI identifies problems like this it can then focus its energy on solving them and become the driving force of our ethics. Already today narrow AI helps us to identify various human bias, e.g. uncovering gender or racial bias in recruitment [8] [9].

Speeding up change

While the evolution of values tends to be gradual, their adoption tends to be more like a staircase with plateaus and large leaps. The leaps come when people or society finally feel confident enough to make a big change. Too often the change is delayed until it is economically viable or until political or societal pressures reach a certain threshold. General AI and the technological advances it will bring with it could be used to help speed up these leaps forward by finding solutions to complex problems thereby allowing the processes to happen faster.

A past example is that of slavery, it took many years and a civil war to bring about the end of slavery in the United States, the system was held together by economic and racial justifications. However, a general AI may have been able to come up with a practical (adoptable) solution which could have been faster, and even avoided war. It could allow society to adapt as fast as thinking adapts.

A current example might be the education system. There are many theories of how to improve it or make it more accessible but with the current social apparatus and technology, the changes can take generations. If general AI puts itself to a task like this, it may be able to create a better system of education, faster. Once we identify a problem and understand that reality should be different based on shared virtues, general AI will be able to make it happen.

References

[1] Cristiano, P. (2018). Clarifying “AI alignment”. Medium.

[2] Conn, A. (2017). How Do We Align Artificial Intelligence with Human Values?. Future of Life Institute.

[3] Bostrom, N. (2012). The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents. Minds & Machines 22: 71.

[4] Awad, E., et al .(2018). The Moral Machine experiment. NATURE. Vol: 563 Issue 7729.

[5] Hursthouse, R. & Pettigrove, G. (2016). Virtue Ethics. Stanford Encyclopedia of Philosophy.

[6] Kraut, R. (2018). Aristotle’s Ethics. Stanford Encyclopedia of Philosophy.

[7] Bennett, W. (199). The Book of Virtues. Simon & Schuster; 1 edition (September 5, 1996).

[8] Bass, D. and Huet, E. (2017). Researchers Combat Gender and Racial Bias in Artificial Intelligence. Bloomberg.

[9] Garcia, M. (2016). Racist in the machine: The disturbing implications of algorithmic bias. World Policy Journal, 33(4), pp. 111–117.

[10] Ito, J. (2016, June 23). Society in the Loop Artificial Intelligence. Joi Ito’s Web [Blog post].