Academic Support for MIRI

This is a response to su3su2u1‘s critique of the Machine Intelligence Research Institute (MIRI).

“MIRI bills itself as a research institute, so I judge them on their produced research. The accountability measure of a research institute is academic citations.”

The author is obviously smart, but there are really two distinct claims here, and he/she confuses the issue by equivocating between them:

Claim 1: Number of academic citations is in fact a perfect or near-perfect indicator of the quality/importance of a body of research. Hence, if a body of work has few citations, we can safely ignore it as low-quality or unimportant.

Claim 2: Number of academic citations is treated by certain institutions as such an indicator. Hence, to obtain status within these institutions, it is instrumentally useful to get more citations.

I think claim #1 can be easily shown to be false. There are many strong arguments against it, but one obvious one is that the absolute number of citations is equal to the fraction of people in a field who cite something, times the total number of people working in that field. And the size of fields varies wildly from place to place.

Eg., consider the paper “The entropy formula for the Ricci flow and its geometric applications“. This paper, which proved the century-old Poincare conjecture, was hailed as one of the most important mathematical advances of the 21st century. It was the first paper to win a million-dollar Millennium Prize, and Science named the paper as its “Breakthrough of the Year”, the only time it has ever done so for a mathematical result (as opposed to a discovery in the physical world). According to Google Scholar, it has been cited 1,382 times.

Now, contrast Perelman’s famous paper with the medical research paper “Familial Alzheimer’s disease in kindreds with missense mutations in a gene on chromosome 1 related to the Alzheimer’s disease type 3 gene“. This paper has 1,760 citations. I wouldn’t call it “unimportant”, but I doubt even Rogaev (the lead author) would claim it’s more important than proving the Poincare conjecture. Medical research is simply a much larger field than mathematics, and so medical papers will get many more cites than math papers of equal importance. Even MIRI’s toughest critics would be hard-pressed to argue that MIRI’s research is less important or high-quality than a doctor rediscovering freshman calculus, a paper which got 75 citations.

By definition, the foundational work in any field is done when it’s new and small. And a new and small field will always have fewer citations than an established one, partly because of the issue above (fewer researchers = fewer citations), and partly because there’s been less time for citations to accumulate. So, if we are to believe Claim #1, foundational work is lower quality and less important than work in a mature field where the low-hanging fruit is already picked. I think everyone remembers prominent counterexamples.

More on Claim #2 in a bit.

“You can measure how much influence they [MIRI] have on researchers by seeing who those researchers cite and what they work on. You could have every famous cosmologist in the world writing op-eds about AI risk, but its worthless if AI researchers don’t pay attention, and judging by citations, they aren’t. (…) This isn’t because I’m amazing, its because no one in academia is paying attention to MIRI.”

This is a separate, third, claim: that MIRI’s number of citations is a good measure of how many researchers are paying attention to it. This claim is not justified, it’s simply assumed. And if one directly asks the question “how many prominent academics are paying attention to MIRI?” – rather than simply assuming citations are a good proxy, and measuring the proxy – even the most cursory Googling shows the answer is “quite a lot”. A very far from complete list:

… and one could go on for a while, but I think the point is made. When data and theory contradict, one must throw out the theory; you don’t keep the theory and throw out the data.

“And yes, I agree this one result looks interesting, but most mathematicians won’t pay attention to it unless they get it reviewed.”

This is an argument from claim #2: that regardless of whether citations to peer-reviewed papers are a good measure or not, you need them to get credibility. Claim #2 is, in fact, largely true within American research universities. However, I think it’s not true for many individual scientists, a number of whom have published scathing critiques of the current academic publication system. I’m pretty sure many, possibly most, younger researchers in math and computer science think of publishing in Elsevier and other for-profit journals as a necessary evil to get ahead within the current system. Since MIRI isn’t part of that system, why should they?

The author later suggests that MIRI should post their math papers on arXiv, one alternative to typical journals. This is a great idea, and I support it 100%. However, the original claim was not that MIRI should post to arXiv, but that (to quote) “Based on their output over the last decade, MIRI is primarily a fanfic and blog-post producing organization. That seems like spending money on personal entertainment.” This is simply not supported by the evidence.

“If they are making a “strategic decision” to not submit their self-published findings to peer review, they are making a terrible strategic decision, and they aren’t going to get most academics to pay attention that way.”

This is another argument from claim #2, and it flies in the face of all the evidence mentioned previously. Moreover, MIRI’s main goal (unlike labs that need government grants) is not to maximize academic attention, but just to get math done as quickly as possible. Some attention is probably good, but too much would be actively harmful: being a celebrity is really distracting and a huge time sink.

Moreover, any academic will tell you that peer review is not simply “submitting” a research paper, the way one submits an essay in undergrad. It is typically a months-long process that demands large amounts of time and mental capacity. This cost becomes obvious when you consider that MIRI once had seven writeups from a single week-long workshop. Even if a few of these writeups were combined into larger papers, how many weeks would it take to get them all peer-reviewed? Twenty? Forty?

“I didn’t know Russell was in any way affiliated with MIRI, he is nowhere mentioned on their staff page, and has never published a technical result with them.”

Russell and Norvig on Friendly AI

And while this other interview doesn’t explicitly mention MIRI, it’s pretty obvious that the ideas derive from Yudkowsky, Bostrom, and other MIRI-sphere folks:

“It’s very difficult to say what we would want a super intelligent machine to do so that we can be absolutely sure that the outcome is what we really want as opposed to what we say. That’s the issue. I think we, as a field, are changing, going through a process of realization that more intelligent is not necessarily better. We have to be more intelligent and controlled and safe, just like the nuclear physicist when they figured out chain reaction they suddenly realized, “Oh, if we make too much of a chain reaction, then we have a nuclear explosion.” So we need controlled chain reaction just like we need controlled artificial intelligence.”

“If he [Russell] is interested in helping MIRI, the best thing he could do is publish a well received technical result in a good journal with Yudkowsky. That would help get researchers to pay actual attention.”

I don’t doubt that this would be a good thing, but it’s at least worth noting that MIRI has a long history of being advised to do various things to get more academic credibility, and this advise failing more often than not:

“If the one is called upon to explain the rejection, not uncommonly the one says, “Why should I believe anything Yudkowsky says? He doesn’t have a PhD!” And occasionally someone else, hearing, says, “Oh, you should get a PhD, so that people will listen to you.” Or this advice may even be offered by the same one who disbelieved, saying, “Come back when you have a PhD.” (…)

And more to the point, if I had a PhD, people would not treat this as a decisive factor indicating that they ought to believe everything I say. Rather, the same initial rejection would occur, for the same reasons; and the search for justification, afterward, would terminate at a different stopping point. They would say, “Why should I believe you? You’re just some guy with a PhD! There are lots of those. Come back when you’re well-known in your field and tenured at a major university.” (…)

It has similarly been a general rule with the Singularity Institute [now MIRI] that, whatever it is we’re supposed to do to be more credible, when we actually do it, nothing much changes. “Do you do any sort of code development? I’m not interested in supporting an organization that doesn’t develop code” —> OpenCog —> nothing changes. “Eliezer Yudkowsky lacks academic credentials” —> Professor Ben Goertzel installed as Director of Research —> nothing changes. The one thing that actually has seemed to raise credibility, is famous people associating with the organization, like Peter Thiel funding us, or Ray Kurzweil on the Board.”

Moreover, it’s not at all obvious that publishing with Russell or other famous professors (in and of itself) gets people that much attention. Over Russell’s lengthy career, how many Berkeley grad students have co-authored with him? And of those, how many got anywhere near as much academic attention as MIRI already has (as demonstrated by the above links) as a direct result of co-authoring, rather than becoming famous for something else years later? I haven’t counted, but I know which way I’d bet.

(Disclaimer: I am not a MIRI employee, and do not speak for MIRI.)

Typology of Conflict

In an idealized far future, would there be conflict? I think so. Competition is one of the thousand shards of human desire, and a lot of people would be sad if there were no more football games or chess or Team Fortress 2.

But such conflicts are not driven by universal goals. Here I must handwave a bit as to what “universal goal” means, but it is something in the neighborhood of being a utilitarian-style drive, rather than a biological-style drive. A human (or other animal) who wants a cheeseburger won’t, even if given the chance, obsessively optimize the atoms of Alpha Centauri to maximize cheeseburger probability. A naively-constructed AI would, giving rise to the problem Nick Bostrom calls “infrastructure profusion”. A “universal goal” is, roughly, one that you would optimize everything in the Universe to meet, not a chess match where you forget about losing the week after.

In a scenario high on the coordination axis, there would be no meaningful conflict over universal goals. Everything with the power to affect the entire universe would agree about non-trivial aspects of how to do so. This is what Bostrom calls the “singleton” scenario, and it’s likely to obey Stein’s Principle: such a system would have both a strong motivation to prevent goal drift or competing systems, and the ability to implement such motivations to enforce long-term stability. Call this null case Type 0 conflict.

Go a bit lower on coordination, and you might encounter a universe with several different systems of comparable ability, which agree about basics like existential risk but disagree on other goals. For example, you could have AIs A, B, and C, where each of them thinks the universe should be blue, red, or green. The simplest scenario is one where the conflict between them is static: each AI gets roughly one third of the universe, and this stays fixed over time, with all AIs having strong reason to expect it to remain fixed. This might be made to obey Stein’s Principle, but it is more of a risky bet. One would need strong reason for believing that any AI getting a “bigger share” was impossible. If, for example, one AI could hack the others, in a manner similar to modern-day computer or social hacking, this would allow for “victory” and introduce instability. Solving this in the general case might be an NP-complete problem; if the computers have some sensory input, you must be able to know that no point in an exponentially large input space will cause serious failure. But, a solution might happen. Call this Type 1 conflict.

Going down more on coordination, one finds scenarios where there are still a fixed number of agents, but their relative positions change over time; call this Type 2 conflict. By Stein’s Principle, the only way to make this work is through negative feedback loops: if there is any case where winning a bit causes you to win more, this will spiral on itself, until one agent ceases to exist. And (handwave) maintaining universal negative feedback seems quite hard. In the human world, advantage in conflict is a combination of many different factors; you would have to get negative feedback on all of them, or else they would collapse and stop oscillating.

If we dare to go down even farther, to worlds with stable long-term conflict in which it is still possible to “win”, one must also allow for the emergence of new players, or else the number of players will monotonically decrease to one (or zero, in an x-risk scenario). And all players should have a basic drive to prevent the creation of new players with differing goals. For this scenario (Type 3 conflict) to work, Stein’s Principle requires that the existing players be unable to prevent the creation of new players (at comparable levels of ability), and simultaneously be able to ensure with extremely high accuracy that all new players pose no threat of existential risk. This seems, a priori, extremely implausible.

And of course, we have Type 4 conflict, the sort present among humanity today, which is obviously not long-term stable. The strange thing is that almost nobody seems to realize it.

Polarity as Flawed Categorization

“Indeed, the more choices you have, the worse off you are. The worst situation of all would be somebody coming up to you and offering you a choice between two identical packages of M&Ms. Since choosing one package (which you value at $.75) means giving up the other package (which you also value at $.75), your economic profit is exactly zero! So being offered a choice between two identical packages of M&Ms is in fact equivalent to being offered nothing.

Now, a lay person might be forgiven for thinking that being offered a choice between two identical packages of M&Ms is in fact equivalent to being offered a single package of M&Ms. But economists know better.” – Improbable Research

In his excellent book Superintelligence, Nick Bostrom divides the future into two groups of scenarios. In one set, the “singleton scenarios”, one agent has overwhelming power over all others. In the other, “multipolar scenarios”, there are many different agents at the same level of ability, with no one in charge overall.

This dichotomy is simple, but it may be flawed. Consider, on one extreme, a very old universe where human civilization has spread beyond Earth’s cosmological horizon. Even in a singleton scenario, large portions of the Universe now can’t communicate with Earth. The AI controlling those portions and the AI controlling the Earth may be identical, but they are causally distinct agents. Is this a “multipolar scenario”? I think not. It’s a choice between a bag of M&Ms, and an identical bag of M&Ms.

On the other extreme, one can imagine a multipolar scenario with a huge variety of agents, each of which may stay the same, change, or be replaced by an entirely different agent. However, this scenario violates Stein’s Principle. At a minimum, to remain stable, each multipolar agent must have a set of common instrumental goals, derived from the common terminal goal of avoiding x-risk. Moreover, Stein’s Principle will likely ensure other similarities. For example, each agent will desire to keep its utility function stable, as the ones that don’t will rapidly get replaced.

Hence, rather than two distinct and widely-separated categories, we have a spectrum of possible futures. On the one end, agents at the top level are identical; on the other, they have just enough in common to ensure stability. Using Bostrom’s terminology, one can visualize these as being at different points along the “coordination axis”.

1950s America is a Special Case

“Advances invented either solely or partly by government institutions include, as mentioned before, the computer, mouse, Internet, digital camera, and email. Not to mention radar, the jet engine, satellites, fiber optics, artificial limbs, and nuclear energy. (…) Even those inventions that come from corporations often come not from startups exposed to the free market, but from de facto state-owned monopolies. For example, during its fifty years as a state-sanctioned monopoly, the infamous Ma Bell invented (via its Bell Labs division) transistors, modern cryptography, solar cells, the laser, the C programming language, and mobile phones…” – “Competence of Government

I think it’s worth paying attention to the fact that, of this apparently arbitrary list of inventions, none of them came from the current US political system (I’ll abbreviate CUSPS). Every one of them was developed many decades ago. And more specifically, every one of them was developed between about 1930 and 1975. Electricity, automobiles, telephones, telegraphs, airplanes, movies, radios, and other much older inventions aren’t included either. Rather than general examples of innovative government, across different cultures and time periods, these are all from one specific political system (the immediate predecessor to CUSPS).

If we were merely using this time period as an example, to show government could innovate, one data point suffices. If we were economic historians, dispassionately debating how large the space of possible civilizations was, we could stop there. However, in what Scott calls the motte-and-bailey defense, that is never how this argument is used in practice. For one example, if you Google (eg.) “government” “arpanet”, of the first ten results every one is in the context of a policy debate about what CUSPS should do and what our attitude should be towards it. ARPANET and things in its category are invariably used, de facto, as justifications for CUSPS, despite not having been created by CUSPS.

Scott’s model of how the world operates here is (to quote) “de novo invention seems to come mostly from very large organizations that can afford basic research”; this is a much stronger claim than that the evidence shows “examples exist of large organizations which did well at de novo invention”, or (a quote from the introduction to the document) “we [can’t] be absolutely certain free market always works better than government before we even investigate the issue”. Given more detailed historical data, I would suggest an alternative model.

In the US, there had always been a great deal of innovation, before the federal government was funding it en masse, and even before the federal government had much power at all. Railroads and steamships and telegraphs came from a time when Washington D.C. could not prevent half the states from raising their own armies and fighting a bloody civil war, much the same as the government in 2014 Iraq.

Later, in the 1930s, the immensely more powerful federal government created policies (taxation, fixed regulatory costs, the SEC, etc.) that strongly favored large organizations over smaller ones. The pre-existing base of inventive individuals, like everyone else, then simply got sucked into large institutions for lack of anywhere else to go. This neatly explains the entire historical trajectory. There were smart guys who invented stuff; the government then hired most of them, so most inventions started coming out with a government logo stamped on them; and when CUSPS was created, its incompetence started driving the smart guys away, again increasing private innovation at the expense of public.

Stein’s Principle

The economist Herbert Stein once said “whatever cannot continue forever must stop”, now called Stein’s Law. We can generalize this to “Stein’s Principle”.

The universe will almost certainly last for many billions of years. In addition, let’s assume that the utility of a mind’s life doesn’t depend on the absolute time period in which that life occurs.

Logically, either human-derived civilization must exist for most of the universe’s lifespan, or not. If it does not, this falls into what Nick Bostrom calls an existential risk scenario. But if it does, and if we (very reasonably) assume that the population is steady or increasing, then this implies the vast majority of future utility is in time periods over a million years from now. This is Bostrom’s conclusion in ‘Astronomical Waste‘.

However, we can break it down further. Let X be the set of possible states of future civilization. We know that there is at least one x in X which is stable over very long time periods – once humans and their progeny go extinct, we will stay extinct. We also know there is at least one x which is unstable. (For example, the world where governments have just started a nuclear war will rapidly become very different, with very high probability.) Hence, we can create a partition P over X, with each x in X falling into one and only one of P_1, P_2, P_3… P_n. Some of the P_i are stable, like extinction, in that a state within P_i will predictably evolve only into other states in P_i. Other P_j are unstable, and may evolve outside of their boundaries, with nontrivial per-year probabilities.

One can quickly see that, after a million years, human civilization will wind up in a stable bucket with exponentially high probability. (Formally, one can prove this with Markov chains.) But we already know that the vast majority of human utility occurs after a million years from now. Hence, Stein’s Principle tells us that any unstable bucket must have very little intrinsic utility; its utility lies almost entirely in which stable bucket might come after it.

Of course, one obvious consequence is Bostrom’s original argument: any bucket with a significant level of x-risk must be unstable, and so its intrinsic utility is relatively unimportant, compared to the utility of reducing x-risk. But even excluding x-risk, there are other consequences too. For example, for a multipolar scenario to be stable, it must include some extremely reliable mechanism for preventing both one agent from conquering the others, and the emergence of a new agent more powerful than the existing ones. Without such a mechanism, the utility of any such world will be dominated by that of the stable scenario which inevitably succeeds it.

And further, each stable bucket might itself contain stable and unstable sub-buckets, where a stable sub-bucket locks the world into it but an unstable one allows movement to elsewhere in the enclosing bucket. Hence, in a singleton scenario, buckets where the singleton might replace itself with dissimilar entities are unstable; buckets where the replacements are always similar in important respects are stable.

How To Detect Fictional Evidence

Based On: The Logical Fallacy of Generalization from Fictional Evidence

Some fictional evidence is explicit – “you mean the future will be like Terminator?”. But it can also be subtle. Predicting the future can be done “in storytelling mode”, using the tropes and methods of storytelling, without referring to a specific fictional universe like the Matrix; one obvious example is Hugo de Garis. How can we tell when “predictions” are just sci-fi, dressed up as nonfiction?

1. The author doesn’t use probability distributions.

Any interesting prediction is uncertain, to a greater or lesser extent. Even when we don’t have exact numbers, we still have degrees of confidence, ranging from “impossible under current physics” through to “extremely likely”. And many important predictions can be done as conditionals. We may have no idea how likely event B is, but we might be able to say it’s almost certain to follow event A.

But stories aren’t like that. An author creates an “alternate world”, and any given fact (“Snape kills Dumbledore!”) is either true or false within that world. There’s no room for the shades of uncertainty one sees in technology forecasting, or for that matter in military planning; it would just leave readers confused. Hence, any author who presents “the future” as a single block of statements all treated as fact, rather than a set of possiblities and conditionals of varying likelihood (see Bostrom’s Superintelligence for an excellent example), is probably in ‘storytelling mode’.

Sometimes, an author will realize this, and tack “but of course, this is uncertain” onto the end. The author can then deflect any questions about relative odds by mentioning this disclaimer, and immediately resume sounding very certain as soon as the questions are over. But, as Eliezer discusses in his original post, this biases the playing field. If X is very complex, asking “X: Yes Or No?” ignores the hundreds or thousands of questions about which parts of X are more likely relative to other parts. An honest analysis will have uncertainty woven through it, with each burdensome detail matched by a diminishment of certainty.

2. The author doesn’t change their mind.

In fiction, inconsistency is bad. Every part of the “alternate world” must match every other part. Therefore, an author writing a sequel must carefully track the original, lest she introduce “plot holes”.

However, a realistic prediction must be continually updated in response to new information. From time to time, other people will give you ideas you hadn’t thought of yourself. And even if you were a supergenius who needed no advice, there is no way a single human mind can hold all the information which might help one make a prediction. For a general, for example, there are always new things to learn about the enemy’s forces.

Hence, if almost nothing has changed between someone’s old predictions and new predictions, they are probably being a ‘storyteller’. This is especially true for anyone predicting the next century, as the events of 2012 give us much more incremental evidence about 2032 than about 200 Billion AD.

3. The author creates and describes ‘characters’.

Characters are central to storytelling. Almost all sci-fi stories, at least in part, use characters who the reader can empathize with – the passion of a lover, the struggle of a worker, the fury of a warrior. The reasons for this lie deep in human evolution and psychology.

However, when making predictions, ‘characters’ are of very little relevance. When describing the futures of billions, one must aggregate to make the problem remotely tractable; “military strength” rather than soldiers, “economic conditions” rather than rich and poor, “transportation demand” rather than kids going on vacation. And of course, while certain individuals can have great influence over society, except for the very near term we can have no clue who they will be.

Therefore, when an author describes in great detail the lives of individual people – their emotions, their personalities, their hardships, their relationships and wants and needs – we should get suspicious. This can be great fun, but it isn’t always good for you, like riding a motorcycle at 200 MPH.

4. The author focuses exclusively on a single dynamic or trend.

The world is very big, and many important trends all happen simultaneously. Predicting how they interact is extremely difficult, like solving a many-variable differential equation. By contrast, it’s easier for a story to focus on an overarching ‘principle’ or ‘theme’, which drives the main events and actions of the key characters. This theme can be very specific (“revenge”), but it can also be a complex of different memes, like Lewis and Tolkien’s literary explorations of Catholicism.

An author in “storytelling” mode may observe trend X, and from X make predictions A, B, and C; and these predictions might be quite reasonable. However, it’s still fallacious to not account for the other things (at least the big ones) influencing A, B, and C. Y might cancel out X’s effect on A; Z might reverse X, and so cause B’s opposite; and Q might have the same effect on C, but a hundred times as strong, so X’s contribution is negligible.

Be extra suspicious if the chosen dynamic is one the author happens to be an expert in, and they don’t rely on experts in other fields to help fill in the blanks. Odds are, they’re missing something very important; in your own field you know when you’re lost, but in others there are many more unknown unknowns. And be extra extra suspicious if the chosen trend is a pet political cause (“Islam”, “taxes”, “global warming”, “government surveillance”, “inequality”… ). That subset is probably worth ignoring entirely.

5. The author predicts rapid change, but doesn’t discuss specific things changing.

Michael Vassar describes this as “everything should stay the same, including the derivatives”. In a story, whether it’s Star Wars or Game of Thrones, it’s usually good to fix a static “backdrop” of culture and technology and norms, as it’s less work for the audience to track unchanging scenery. But in real life, changing fundamental traits like military ability, economic ability, communication, intelligence or transportation has sweeping consequences through nearly every aspect of society. To name one example, the Chinese Empire was old as the hills, but the changes of the 20th century caused it to collapse, followed by a republic, a civil war, a military occupation by Japan, a brutal Stalinist regime, and finally the authoritarian capitalism of today. And needless to say, no historical example will capture the changes caused by going beyond normal biological humans.

What Is A Copy?

Much reasoning in anthropics, like the Sleeping Beauty problem or Eliezer’s lottery, relies on the notion of ‘copies’. However, to my (limited) knowledge, no one has seriously investigated what counts as a ‘copy’.

Consider a normal human undergoing an anthropic experiment. I instantly cool him/her to near absolute zero, so the atoms stop wiggling. I make one perfect copy, but then move one of the atoms a tiny amount, say a nanometer. I then make another copy, moving a second atom, make a third copy, and so on, moving one atom each step of the way, until I wind up at (his father/her mother) as they existed at (his/her) age. At every step, I maintain a physically healthy, functioning human being. The number of copies needed should be on the order of 10^30. Large, but probably doable for a galactic superintelligence.

But then I repeat the process, creating another series of copies going back to (his grandfather/her grandmother). I then go back another generation, and so on, all the way back to the origin of Earth-based life. As the organisms become smaller, the generations grow shorter, but the number of copies per generation also becomes less. Let’s wave our hands, and say the total number of copies is about 10^40.

Now, before putting you to sleep, I entered you into a deterministic lottery drawing. I then warm up all the modified copies, and walk into their rooms with the lottery results. For the N copies closest to you, I tell them they have won, and hand them the pile of prize money. For the others, I apologize and say they have lost (and most of them don’t understand me). All copies exist in all universes, so SIA vs. SSA shouldn’t matter here.

Before you go to sleep, what should your subjective probability of winning the lottery be after waking up, as a function of N? When doing your calculations, it seems there are only two possibilities:

1. The cutoff between ‘copy’ and ‘not a copy’ is sharp, and you assign each organism a weighting of zero or one. That is, it’s possible to move an atom one nanometer, and thereby make an organism go from “not you” to “you”.

2. There exist ‘partial copies’ out in mind space, and you assign some organisms partial weightings. That is, there exist hypothetical entities which are two-thirds you.

Both seem problematic, for different reasons. Is there a third option?


Get every new post delivered to your Inbox.

Join 63 other followers