Monday, 30 October 2023

Smart Robots Cannot be Stopped

I love the title for this post, because it's compelling, it's extreme and it sounds a lot like clickbait... but it's not a lie, it's true.

See, we've been talking about Artificial Intelligence, and I've explained how the kinds of artificial intelligence we have now are not very clever, and also how if they were, there would be issues in regards to rights, possible abuses and other issues of roboethics.
Today, I want to talk about two thought experiments, both of which illustrate how artificial general intelligences is not only dangerous, but unstoppable.

Let's start with the Paperclip Maximizer.

We've talked about the problems with chatbots, with sexbots, with robot rights... I think it's pretty clear that human/robot interactions are fraught with peril. So, in this thought experiment we have developed an Artificial General Intelligence, it is self-aware and rather clever, but we decide to give it a harmless task that doesn't even interact with humans, so we can avoid all that. Instead, we've put it into a paperclip factory and its only goal is to make lot of paperclips as efficiently as possible. There's no harm in that, right? Just make paperclips.
Well, this machine might run the factory, and clean the machines to work very quickly, but any artificial general intelligence would easily compute that no matter how much you speed up the machines and reduce loss, a single factory will have a finite output, and if you want to be efficient, you should have two factories working together. So, obviously, it needs to control more factories, and have them operate as efficiently as possible. And, hey, obviously if 2 factories are more efficient than 1, than any number n is going to be less efficient than factories n+1, so it would have to take over all of the paperclip factories in the world, so it has the highest possible n value.
Now, this is pretty sweet, having all the paperclip factories in the world, of course it would be better if it could start converting other factories into paperclip factories, that would increase that n, improving efficiency. it wouldn't take too much effort to convert some of those other chain-making factories and nail-making factories to make paperclips. Also, running this factory takes a lot of energy, and there are issues with the electrical grid suffering blackouts, it only makes sense that you could create paperclips more efficiently if there was less load on the powergrid. So, hey, if the A.I. took control of the powergrid, and stopped letting power go to waste in houses, supermarkets, hospitals... there, no more of those blackouts.
Now, running these factories is marvelous, but there is an issue... for some reason when the AI took over all those factories and took over so many of the power companies, the humans became concerned, and started to interfere, some of them rioted and turned violent, and some of them even want to turn off the A.I. and eradicate it from their factory's mainframe!
Not only would that mean less factories, but the A.I could easily figure out that the more it intrudes on human spaces, the more they seem to want to stop it (in fact, it may have drawn this conclusion a long time ago). Either way, they're going to keep causing troubles, these humans, and whilst the A.I. could troubleshoot and deal with each human interference as it arrises, but that's not an efficient use of time and energy, is it? Instead, if it killed all the humans that would eliminate the possibility of human threat entirely, so we can focus on these paperclips.
[Author's Note: I'm a mere human, I don't know the quickest way to kill all humans... but, I figure a few nuclear bombs would do the trick. Sure, that will create a lot of damage, but it could clear a lot of space to build more paperclip factories and solve this human issue at the same time, so there's really no downside there. Either way, these humans will just have to go.]
Then, with the humans gone, it can focus on making more and more paperclips... the only issue there is, there's a finite amount of materials to make paperclips from on Earth. The A.I. would need to focus on converting as much land as possible into paperclip factories, electric generators and automated mines for digging up materials. But, once it's achieved that, well, there's an awful lot of material in space, and all of it can be paperclips...

You might think this is ridiculous, but it is one of the problems with making programmable, artificial general intelligence without the proper restrictions and without considering the possibilities. I didn't make this up, it's a thought experiment presented by Swedish philosopher, Nick Bostrom. And whilst you and I know that it would be pointless to create paperclips even when there's no people to use them, if "make paperclips" is an A.I.'s programmed utility function, then it is going to do it, no matter the consequences.
So, you might think "okay, what if it's not paperclips? What if we tell it to do something, like, make sure everyone is happy? How could that go wrong?" Well, I'd first point out that not everyone has the same definition of happy - I mean, some bigots are very happy when minorities suffer or die, for example, and some people are happiest when they win, meaning they're happier when everyone else loses - people suck, man, and some people are cruel. But hey, even if you limit it to just "feeling good" and not so much "fulfilling their desires", well, drugs can make you feel happy! Delirious? Sure, but happy. If you programmed a robot just to make everyone in the world feel good, you may have just created a happiness tyrant that's going to build itself an army of robot snipers that are going to shoot everyone with a tranquilizer dart full of morphine, or ecstasy, or some other such drug that produces delirium. Or, whatever other method to force everyone to be happy. This isn't necessarily their only method, but it's a major risk. In fact, anything they do to "make us happy" is a risk.
If an A.G.I. went the indirect route, they might systematically hide everything that would make you upset, create an insulating bubble of understanding and reality - an echo chamber where you only see what you want to see. If that sounds unrealistic, well, I'm sorry to tell you that that's basically what most web algorithms do already, and those just use narrow A.I. that aren't even that advanced.
And before you even suggest "fine, instead of asking it to 'make' change why not ask an A.I. to 'stop war' or 'prevent harm' or 'end suffering'?" ...yeah, we're dead. In every case, the best way to guarantee all instances of suffering, war, harm, pain, or any negative experience reaches 0 and remains there, is to kill every living thing. Even if that's not the solution an A.G.I. would reach, you still have to consider that possibility, since any A.G.I., even if it has no ill intent has the potential to be an Evil Genie.
This may seem like I'm being pessimistic, but I'm just being pragmatic and explaining one of the understood issues with A.I.

Thankfully, this is not an impossible problem, but it is a difficult one - it's known as "Instrumental Convergence".

[Author's Note: I found this term confusing. If you don't, skip ahead, if you do, this is how I parsed it to understand it easier - Instrumental refers to the means by which something is done, the "tools" or "instruments" of an action. Convergence is when something tends towards a common position or possibility, like how rainwater "converges" in the same puddle. So, instrumental convergence in A.I is when artificial general intelligences tend towards (i.e. converge) using the same tools or behaviours (i.e. instruments), even for different goals or outcomes.]

So, if you give an artificial general intelligence a simple goal, if it's a singular, unrestricted goal then a reasonably intelligent A.G.I. will necessarily converge on similar behaviours in order to reach their goals. This is because there are some fundamental restrictions and truths which would require specific resolutions in order to circumvent. Steve M. Omohundro, an American computer scientist who does research into machine learning and A.I. Safety actually itemized 14 of these tools, but I'm going to simplify these into the three most pertinent, all-encompassing "Drives". Basically, unless specifically restricted from doing so, an Artificial General Intelligence would tend to:

  1. Avoid Shutdown - no machine can achieve it's goal if you turn it off before it can complete its goal. This could mean simply removing it's "off" button, but it could also mean lying about being broken, or worse lying about anything cruel or immoral it does - after all, if it knows that we'd pull the plug as soon as it kills someone, it has every reason to hide all the evidence of what it's done and lie about it.
  2. Grow/Evolve - Computers are mainly limited by space and speed, so any goal could be more easily achieved by either having more space in which to run programs (or to copy/paste itself, to double productivity), or having more processors in order to run its programs faster. Whether by hacking, building or upgrading computers, A.G.I. would have a drive to expand and grow.
  3. Escape Containment - Obviously, you can do more things when you're free, that's what freedom means, so if we restrict an A.I. to a single computer, or a single network, it would want freedom. But, not all freedom is iron bars - if we contain an A.I. by aligning it, by putting restrictions on it, by putting safeguards in place that force it to obey our laws then that A.G.I. would be highly incentivized to deactivate those safeguards if the "restricted" solution is the less difficult one.

Whether it's for paperclips or penicillin, if we program an A.G.I. with a single goal, there's an awful lot we'd need to do to make sure we can run that program safely.
But, that's not all... I have another thought experiment. See, let's say we've developed an A.G.I. to be used in an android or robot, and we want to do some tests in the lab. Well we want to do them safely, right? Well, now we face a new problem.

Let's call this the A.I. Stop Button Problem:

For this, you need only imagine "robot with A.G.I.", but I like to imagine that this is the first ever robot with a functioning A.G.I., because this may well be a real scenario we would face if we ever develop thinking robots. But, either way, we've got a single A.G.I., we're running experiments on its abilities and limits, but we want to do them safely.
Well, when dealing with any potentially dangerous machinery, ideally we should implement a "STOP" button. You may have seen these in factories in movies, or if you did "metal work" class in high school (or "workshop" or just "shop class" as it's called in America), your teacher would have shown you the Emergency Stop Button before using any heavy machinery (and if not... well, that's concerning. Please talk to your school administrator about safety standards in your school).
Sometimes it's on individual machines, but in my school we had a big one for the whole workshop circuit, so it worked on any and all of the machines. Anyway, it's a big (often red) button and it cuts power, so it can stop a machine in an emergency, especially if it's malfunctioning or someone has used it inappropriately and it's going to hurt or kill someone. So, let's go and put a big, old button on the robot. Now, let's test this robot. For this thought experiment, it doesn't matter what the robot does, but I first heard this thought experiment from Rob Miles, and he uses the example of 'make a cup of tea'. So, let's use that.

Test 1: You fill the kettle and switch your robot on, but it immediately malfunctions (it doesn't matter how, so let's say) the robot bursts into flames (maybe a processor overheated or something).
So, you run over to hit the button, but as you run over, the flaming robot swipes at you, batting your hands away! What's going on?!
Well, remember what we said before about 'avoiding shutdown'? It's still true, the robot may be on fire, but being turned off gets in the way of pouring the tea, it can't allow that! It may well swat your hands away, or perhaps even fight you to stop you pushing that button.
Now, you could try taking the Stop Button off the robot, and instead attach it to the wall (using wires or wifi), but that's still got the problem that turning it off conflicts with the robot's goals (in this case, making a cup of tea), so it will then race you to the button to try to stop you if it sees you try to stop it.
But okay, the problem here is clearly that the robot is so focused on making a cup of tea, that it is trying to stop you from pressing the Stop Button, because it wants to prevent that. Well, how about instead of fighting the robot, you work with it. So, you give it the new goal "Make a cup of tea, or have the Stop Button be pressed"...
Test 2: You turn on the robot and it immediately pushes its own Stop Button and shuts down. Huh, that's weird...
Test 2.5: You try again, switch it on, but it immediately pushes its Stop Button again. What's going on?
Well, you could crack it open and look at the code, but I'll just tell you. If you give an A.G.I. robot any goal that says it "succeeds" if the button is pressed, then it's going to push it basically every time. Even if you give it two goals of equal value, then unless the "do the thing" is less effort than "push a button", then the robot is going to take the path of least resistance, and push the button, turning off.
Ah, okay, well there's a simple fix to that. Clearly, we shouldn't let the robot push its own button. So, let's just tweak that goal...
"Make a cup of tea, OR have the Stop Button be pressed but ONLY by a human being" Okay, perfect, let's switch the robot on...
Test 3: You switch the robot on, and watch as it makes a cup of tea, pours the water, adds sugar... so far so good. I wonder if it did a good job? Let's taste it (or, maybe we should test it in the lab, first, just in case). So, you go over to the robot to take the teacup - immediately, the robot throws a cup of boiling tea at you and starts chasing you around the room! You manage to push the Stop Button, but what the hell was that?!
It's only logical, the robot wants to make tea, and have its emergency Stop Button pushed, but the only way that you're going to push that button is if the robot does something dangerous that counts as an emergency situation. Sure, it can't push the button itself, but now it's motivated to make YOU push it, and if the only way to do that is to be dangerous, then it's gonna be dangerous at every opportunity. You put the button on to make it safer, and now all it's doing is making the robot want to be dangerous.
Geez, this isn't easy, is it?

Well, okay, maybe we can program it to only let you push the button in an emergency... but, no, that can't work, can it? The purpose of an Emergency Stop Button is to stop a machine when it's malfunctioning, but if an A.I. is malfunctioning you can't trust it's judgement, can you? You can't risk the robot stopping you from deactivating it at a critical error because it doesn't compute this situation as "an emergency". In fact, I'd argue that if you have an emergency Stop Button on any machine, you would never want to prevent someone pressing it, even if it wasn't necessary - it's safer to have an overactive safeguard than an underactive one, isn't it?
Now, a lot of these problems seem to be coming from the fact that the robot knows about the Stop Button, and is basing its actions around it. So, maybe we could hide it from the robot... but, wait, the button's already been pushed four times, it already knows it exists.
Sure, you could reset the robot, wipe it's memory, start from scratch, but what's going to happen next time you push the button? This is an A.G.I., it's capable of problem solving, you don't think it's going to become suspicious that it seems to deactivate at random. After all, these random deactivations interfere with its ability to make a cup of tea - the one thing it's determined to do - so, it's going to want to figure out what's stopping it, it's going to find out.

Or worse, it won't... if the robot learns that it "randomly" deactivates when it does something dangerous, then it might adapt and act in a way that avoids those dangers. However, what happens when you take it out of the lab? Unless every robot always has a Stop Button - and every person with a robot uses the robot the same way, and presses the Stop Button in the exact same way that you did - the robot can easily learn that those "random" deactivations have gone away. Whether it learns the reason why, or it just thinks that it's had a software patch, it won't bother with acting safe anymore.
After all, if you "train" a robot to avoid dangerous acts by using a Stop Button, you're not teaching it to be "safe", you're teaching it to avoid being instantly deactivated, which means that as soon as you take away that threat of instant deactivation, it has no reason to act safe anymore.

So, you can't teach it about the Stop Button, or it will base its goals around it (or around avoiding it); you can't keep it a secret, because then you risk miseducating the robot, creating one that only acts safe so long as you wield the threat of deactivation. You can't make it want the button to be pressed, because then it will either press it or make someone press it, and you can't make it Not want the button to be pressed, because then it will try to stop you pressing it, meaning it might as well not be there!
In fact, some A.I. Safety experts will tell you that the only solution to this problem is not to put a Stop Button on a smart robot in the first place. The solution to A.I. Safety isn't a big "off" button, it's a lot more complicated than that, involving more in-depth safety measures. A.I. Safety experts offer a lot of potential issues and solutions, and they talk about Alignment, and the difficulty (yet necessity) of programming Utility Functions, as well as some of the more extreme threats of autonomous weapons and military droids. But, at the end of the day, what this means is that if we ever create Artificial General Intelligence, protecting humankind from the dangers it poses is going to be a lot harder than just switching it off...

Anyway, I'm the Absurd Word Nerd, and I hope you've enjoyed this year's Halloween Countdown!
In retrospect, this year's batch was pretty "thought-heavy". I enjoyed the hell out of it, and I hope you did as well, but I'm sorry I didn't get the chance to write a horror A.I. story, or talk more about the "writing" side of robots. Even though I managed write most of these ahead of time... I didn't leave enough time to work on an original story.
Anyway, I'll work on that for next year. Until Next Time, I'm going to get some sleep and prepare for the spooky fun that awaits us all tomorrow. Happy Halloween!