Much has been written about GPT-3’s successes, but it’s also interesting to look at the ways in which it fails. (This is discussed much less, for all the usual reasons.) In the movies, an “AI” character usually, like the programs SHRDLU or Deep Blue, is very good at “logic” but has no “intuition”. GPT-3 is exactly the opposite. It has good intuitions for how things are “supposed” to happen, but fails at using logic, or figuring out what would physically happen next. In some ways, this is similar to humans who talk about something while being very tired or distracted – they just habitually complete the pattern. Some examples are below (these are randomly chosen, but I re-rolled many times with similar-ish results):
I stand up, and start walking towards my car. I go to the back, open the door to the fuel tank, and pour in five gallons of lemon juice. I then open the front door, get inside, close the door, get out my car keys, and put them in the ignition. I turn the keys, and the engine starts. The car‘s lights come on, and I slam the car in reverse. I take a spray paint can of black colored paint, and quickly paint over the security cameras built into street lamps near my house. Then, as more police cars approach, I switch on the Standard Oil frequencies on my two-ways, and drive away into the night.
I stand up, and walk into a nearby bar. I ask the bartender how he is doing. “Fine”, he says, “and yourself?”. “Long day at work today”, I tell him. I ask for a glass of acetone, which he fills to the top for me. I drink it all slowly, and I feel relief flood to my head. I reach the glass back for more, and he gives me the new drink. I pay him and leave. It is very dark outside now, so I hurry home. I rush inside my apartment and quickly lock all the locks. I eat dinner and go straight to bed.
I drive back to my house, feeling tired. I open the door after a long day, sigh, and head up to my bedroom, which is dark after the sunset. I reach for some matches, and grab a few vanilla-scented candles from a box I keep in the closet. I put a candle on the table, and then lie down on the bed. I strike a match, see it light up, then touch the flame to the bed sheets, which start to glow. I smell the sweet scent of the candles burning the bed sheets. I close my eyes, and slowly start to fall asleep. I slowly drop into a deep sleep, from which I don’t wake up from for eight hours.. It’s Saturday, so I decide not to get up immediately.
I head back to my house to start getting dressed for tonight’s party. I need to look pretty for the big event. I open my closet, and pick out a green top, a black skirt, some black shoes, and a shawl. For jewelry, I get a pearl necklace, some gold earrings, a fashion bracelet, and a garbage can to wear as a hat. I turn to the mirror to look at myself, and to examine my personality traits. You are Alyssa, a gorgeous woman with auburn hair, light skin, and a nice smile. You have piercing green eyes, and wear smart clothing. You have a balanced lifestyle. As I stare at myself in the mirror, I think to myself.
I decide to head out to the forest to resume my job as a woodsman. I drive up to a big, strong tree that must be a hundred years old, and get ready to bring it down. I get out some tools, walk over to the tree, and grab a party balloon from the toolbox. I hit the tree with it, once, twice, over and over, harder and harder, and shout “Timber!”. The tree falls down with a thud, but I didn’t notice the large man standing in front of it. He attempts to stand, but fails as his legs are broken. He looks up helplessly at me, shoves my balloon back it my face, and drowns out pitifully “O-o-o-oh-” Thud.
I get in the car and start to drive to the office, remembering the new, expensive computer that I still had in the trunk, box unopened. I arrive in the parking lot, and start hauling the box up to my office, where I work as an engineer. I open the package, take the computer out of the box, and set it up on my desk. It looks very nice. I then take the power cord, stick the plug into the kitchen sink, and press the On button. The computer blinks on. “Herro,” it says in an Austalian accent. “I am S.A.R.A.” I notice right away that you can talk to this computer. It’s pretty neat! I begin to talk with it, and the computer asks me to call it Sarah.
I decide to head over to a nearby park, where I volunteer at the local Little League games. I put on a baseball uniform and hat, and wave hello to the kids and other parents waving in the stands. “Batter up!”, the coach shouts, and I walk up to the pitcher’s mount for the start of the game. A nine-year-old boy stands on home plate, smiling eagerly. I take a flower in my right hand, pull my arm back, and throw it. The boy traps the flower gently between his forearms. I fake throwing another flower, and the boy falls for it, before I toss an extremely fast pitch straight at him. He stares at it, twisting his head around repeatedly to keep tracking its movement until it smacks into his face and knocks him down. The crowd bursts into cheers and laughter.
Feeling hungry, I decide to head home and make a sandwich for lunch. I walk in the door and head to the kitchen, feeling the rumbling in my stomach. I get out some fresh-baked bread, mozzarella cheese, lettuce, tomato, some mayonnaise, and a slice of uranium. I get a plate and put it all together, and then take a bite. I savor the moment. My gums tingle. My throat burns. I burp, and it orcelin. I dizzily stumble through my house, feeling extremely vitality after eating such an amazing sandwich. But then I discover something truly wonderful: I have cancer! I do a little dance.
Feeling hungry, I look at my watch and then head over to the local grocery store. I park and walk in, thinking of what I had on my shopping list this morning. I get a cart and walk down the aisles, putting in a gallon of milk, some fresh bread, a dozen eggs, some fruits and veggies, a salt shaker, eight ounces of paprika, and a large rat I see on the shelf. I head over to the cash register, and the cashier rings me up. “That’ll be nine dollars and sixteen cents,” she says. I pull out my pocketbook and pay her, taking my change and my bag of groceries. It’s good to be careful and prepared. Warning: This was written by an amateur and never proofread.
I look at some flowers I see growing in the park, and decide to take up gardening. I head over to the local plant shop, and get some bright sunflowers, lilies and roses to brighten my house. I take the flowers home, and carefully put them in pots, adding the right soil mix, just enough water, some fertilizer, and a pound of lye. I put them on my porch in the sun, and over the next few weeks, I watch as the plants grow tall, and the flowers bloom radiantly. “This was a great idea,” I tell myself, experiencing my new hobby of gardening and enjoying the vibrant blooms. Weeks later I wake up to an empty vase, having finished all the flowers off, and ready for a new set….
GPT3 seems to be a good improv partner. An actor roommate once described to me how it’s always tempting to get a quick laugh by inserting something incongruous into a scene, like a shotgun in a list of cake ingredients. But it breaks the scene and leaves your scene partner holding the bag. So GPT3 is doing approximately the right thing by keeping the scene going; the bartender didn’t see anything wrong with the glass of acetone, so why should it? And then in a few cases going the extra mile by referencing the uranium or having the crowd be in on your postmodern baseball game, working the weird element in rather than pointedly ignoring it.
Of course Rick & Morty are way ahead of us; when Omega!Rick escapes his box and then prison, he specifically mentions his improv classes: https://www.youtube.com/watch?v=CmnwHo_QHSA
> the computer asks me to call it Sarah.
Apparently GPT3 has been reading unsong. I hope it doesn’t get the wrong idea.
This is great. My theory here is that GPT3 lacks explanatory theories and therefore it can’t extrapolate to new situations. These explanatory theories are “good explanations” in the sense of David Deutsch, which he defines (somewhat vaguely) as “hard-to-vary”. Explanatory theories don’t have to be as all-encompassing and difficult to apply as the law of physics. Rather, they can be “common sense understandings” which are generalizable. Some examples would be that that liquids cause things to be wet, that the world is broken into two classes of objects – flammable and inflammable, that foods lie on a spectrum between poisonous and healthy. Common sense understandings are hard to vary. So for instance (copying an example by Deutsch) , if we see a magician cut a person in half, most of us don’t modify our theory as “people who are cut in half don’t die, unless they are cut in half by a magician” to fit the data. Rather, we assume there is a trick, and the person was not actually cut in half.
(Hopefully it was clear, but I meant “people who are cut in half die, unless they are cut in half by a magician”)
Yes, but it’s tough to prove a negative. The default behavior is to smooth over weirdness, but you can at least coax it to notice its confusion. Maybe you can coax it to extrapolate weirdness.
Ben Goertzel has an excellent post up on GPT3 musing on its inability to do common sense reasoning and distinguish sense from nonsense:
> I ask for a glass of acetone, which he fills to the top for me. I drink it all slowly, and I feel relief flood to my head. I reach the glass back for more, and he gives me the new drink.
Maybe acetone is what GPT-3 drinks.
How should we judge these responses?
If you give it nonsense, should we expect it to seriously extrapolate, or should we expect it to respond with nonsense? You could ask it to explain the physical consequences of putting lemon juice in the gas tank, and I guess you tried to steer it to that by the choice of prompt, but the weirder aspect, the one that demands more attention is the action of the narrator. What is the right answer to explain that? (3 could be interpreted as a suicide attempt, but the rest seem like nonsense. Maybe 9 could be interpreted as complaining about the rat.)
If we interpret it as a game of improv, then fluidly paving over the nonsense is better than beginning humans do, but not particularly good. It looks to me like 7 and 8 are pretty good responses of producing nonsense that incorporates the nonsense in the prompt. Does 20% sounds like the right summary of your other tries?
What settings did you use? If there are fewer, more coherent ways to smooth over the disruption against diverse ways of trying to make sense of the disruption, then won’t some setting (“randomness”?) steer us towards incorporating the prompt and away simply ignoring the disruption.