Unbounded Goals

Note: This is a long one and it kind of jumps around a little bit but if you stick with it to the end, hopefully it will all make sense. If it doesn't, let me know and maybe I can take another stab at it.

I’ve been thinking a lot lately about instrumental convergence and how our society misinterprets it in a way that results in our toxic relationship to money. Instrumental convergence includes the idea that if a sufficiently intelligent agent (e.g. a human, a corporation, or a theoretical artificial general intelligence), has an unbounded intrinsic goal, they will likely pursue unbounded convergent instrumental goals.

Let’s look at some concrete examples of intrinsic and instrumental goals. An intrinsic goal is a goal that is an end in itself. Intrinsic goals vary from person to person but some common intrinsic goals among humans might be happiness, fulfillment, the well-being of your children, excitement, etc. These are things that you want for no other reason than that you want them.

Let’s take happiness as an example since I think its an intrinsic goal that is relatively easy to understand and that most people seem to share. People seek to achieve the intrinsic goal of happiness by pursuing instrumental goals (i.e. goals that help them make progress towards their intrinsic goal). These instrumental goals might be extremely different from person to person.

One person might seek to achieve happiness by travelling more. For another person, an instrumental goal might be owning a BMW. A third person might seek to achieve happiness by having more time to spend with their children. If, however, you take one step back, you find that all of those goals have at least one common instrumental goal of their own that makes them much easier to achieve.

In this case (and in many cases for humans) one of the things that makes all of those goals easier is money. Whether you want to travel more, own a BMW, or spend more time with your kids, having money helps. For this reason, money can be thought of as a convergent instrumental goal. For a great many different intermediate and end goals that humans might have, having money makes those goals easier to achieve.

That’s why even if money isn’t one of your intrinsic goals, making money is probably a super helpful instrumental goal to have.

There’s a famous thought experiment that examines the dangers of unbounded goals and instrumental convergence called The Paperclip Maximizer. Imagine for a moment that you are a paperclip company and someone gives you a computer with artificial general superintelligence. You might tell that superintelligence to figure out how to maximize the number of paperclips that it can produce. This is an example of an unbounded goal. You’re not telling the computer to figure out how to make a thousand paperclips, or a million, or a billion. You’re telling it to figure out how to make as many as possible.

Assuming this superintelligence has some ability to communicate with or otherwise manipulate the outside world, it will set about trying to achieve its goal. If you take this thought experiment to its logical conclusion, you quickly realize that the computer will need to acquire as many resources as possible in order to achieve its goal. It may start to try to manipulate people, companies, and governments in order to amass more raw material with which to make paperclips. When it runs out of metal to mine from the ground it might start disassembling buildings. When it has exhausted all of the buildings, cars, cans, and other easy sources of metal, it may realize that the human body contains a small amount of metal. It might even figure out how to convert the other elements that make up the body into materials that are suitable for making paperclips. Ultimately, if the superintelligence is smart enough, it will convert everything in the universe into either paperclips or machines for making paperclips. This is obviously not a desirable outcome.

Someday I’ll write another post about the very complicated issue of AI safety (I believe that the problems that are raised in the field of AI safety can go a long way towards helping us understand how to properly regulate corporations) but for the purpose of this topic one of the things that might have avoided the problem above would be to not use unbounded goals.

In our society, publicly traded companies have a legally mandated, unbounded goal of maximizing shareholder value (i.e. money).

The paperclip company makes money by selling paperclips so it would be pretty straightforward to assume that making more paperclips equals making more money. While this may be true up to a point, it’s obvious that this formula does not scale to an arbitrarily large number of paperclips. There is a limited global demand for paperclips. If you make more paperclips than people are willing to buy, then you end up losing money for each additional paperclip that you make instead of making more money. In other words, there is a point at which the instrumental goal of making more paperclips no longer serves the intrinsic goal of making more money. This means that in addition to destroying the universe, the paperclip maximizer’s goals were ultimately not aligned with the intrinsic goals of the company even though one of the company’s main instrumental goals is making paperclips and that was the maximizer’s only goal.

Let’s jump back to the human example of happiness. If travelling is an instrumental goal to achieving a person’s happiness, perhaps travelling more makes them more happy. Again, this may be true up to a point. Maybe travelling once per year results in more happiness than not travelling. Maybe travelling 5 times per year results in more happiness than once, but I think it’s reasonable to assume that travelling 50 times per year would not make them happier than 5.

Owning a BMW may be an instrumental goal to a person’s happiness but owning 100 BMW’s will almost certainly cause more problems than happiness. Even if the intrinsic goal (in this case maximizing happiness) is unbounded, typically the instrumental goals (e.g. travel or BMWs) have bounds beyond which they are no longer effective or even become counterproductive.

This fact seems to apply to the convergent instrumental goal of money as well. Study after study suggest that after a certain point (typically somewhere between $70,000 and $150,000 per year in the United States) more money does not result in more happiness. This fact seems completely counterintuitive to most people who were raised in our society. It “feels” like the more money you get, the happier you will be but if we just take a moment to think about the people we know, that notion seems to break down pretty quickly. Anecdotally, my friends and acquaintances who make a million dollars a year are no happier than those who make $100,000 a year. In fact, the opposite seems to be true. I know far more deeply unhappy rich people than unhappy middle class people.

There are many reasons for this but the one that I’ve been thinking about recently involves this idea of unbounded goals. Because we have defined the success of companies based solely on the unbounded intrinsic goal of making more money, we tend to confuse it for something that should be an unbounded intrinsic goal in our personal lives as well. There is no question that money is a very useful convergent instrumental goal to achieving all kinds of worthwhile things but when we lose sight of what our real intrinsic goals are and try to maximize money at the expense of everything else, we are doing ourselves (and often the world around us) a great disservice.

I want to include a little bonus thought for those of you who might think that the paperclip maximizer is a ridiculous example that may not be applicable in the real world. Companies can be thought of in many ways as analogous to (or even as a form of) artificial general superintelligence. A company made up of many specialized people will likely be far more intelligent than even the smartest individuals. That’s one of the reasons that companies have become ridiculously efficient at achieving the goal of making money.

There are companies that make paperclips. I think most of us agree that paperclips are a useful thing to have around and, in some situations, they help you to achieve the instrumental goals that you are pursuing in order to achieve happiness. The problem is that the paperclip company’s intrinsic goal is not to maximize human happiness (or even to make paperclips). The company’s intrinsic goal is to maximize money and it’s even easier for the distributed intelligence of a company to ignore moral issues than it is for individual humans.

I think that most of us would agree that the benefits to happiness that we, as a species, derive from paperclips is less than the costs associated with strip-mining large swaths of continents in order to produce them. And yet, because of the unbounded and relatively unchecked goal of making money, we end up strip-mining entire landscapes in order to make (among other things) paper clips. Any benevolent rational actor with the power to control our system with real intrinsic human values in mind would not permit such a thing to occur.

We place the bulk of the power in the hands of entities whose goals do not align with our own, and then we wonder why we face problems like climate change.

Climate change is the real-life embodiment of the paperclip maximizer problem. We are consuming all of our resources in the pursuit of goals that aren’t even the ones that we meant to be pursuing in the first place.

Previous
Previous

Quanta

Next
Next

Fun Facts: Brussels Sprouts Edition