What Is A Copy?

Much reasoning in anthropics, like the Sleeping Beauty problem or Eliezer’s lottery, relies on the notion of ‘copies’. However, to my (limited) knowledge, no one has seriously investigated what counts as a ‘copy’.

Consider a normal human undergoing an anthropic experiment. I instantly cool him/her to near absolute zero, so the atoms stop wiggling. I make one perfect copy, but then move one of the atoms a tiny amount, say a nanometer. I then make another copy, moving a second atom, make a third copy, and so on, moving one atom each step of the way, until I wind up at (his father/her mother) as they existed at (his/her) age. At every step, I maintain a physically healthy, functioning human being. The number of copies needed should be on the order of 10^30. Large, but probably doable for a galactic superintelligence.

But then I repeat the process, creating another series of copies going back to (his grandfather/her grandmother). I then go back another generation, and so on, all the way back to the origin of Earth-based life. As the organisms become smaller, the generations grow shorter, but the number of copies per generation also becomes less. Let’s wave our hands, and say the total number of copies is about 10^40.

Now, before putting you to sleep, I entered you into a deterministic lottery drawing. I then warm up all the modified copies, and walk into their rooms with the lottery results. For the N copies closest to you, I tell them they have won, and hand them the pile of prize money. For the others, I apologize and say they have lost (and most of them don’t understand me). All copies exist in all universes, so SIA vs. SSA shouldn’t matter here.

Before you go to sleep, what should your subjective probability of winning the lottery be after waking up, as a function of N? When doing your calculations, it seems there are only two possibilities:

1. The cutoff between ‘copy’ and ‘not a copy’ is sharp, and you assign each organism a weighting of zero or one. That is, it’s possible to move an atom one nanometer, and thereby make an organism go from “not you” to “you”.

2. There exist ‘partial copies’ out in mind space, and you assign some organisms partial weightings. That is, there exist hypothetical entities which are two-thirds you.

Both seem problematic, for different reasons. Is there a third option?

Four Types of Simulation

(Followup To: Simulations and the Epicurean Paradox)

Logically, one can divide ancestor simulations into two types: Those which are “perfectly realistic”, simulating the full motions of every electron, and those which are not.

Call the first Type 1 simulations. It’s hard to bound the capabilities of future superintelligences, but it seems implausible that they would run large numbers of such simulations, owing to them being extremely computationally expensive. For Goedelian reasons, simulating a component perfectly is always more expensive than building it “for real”; if that were not the case, one could get infinite computing power by recursively simulating both yourself and a computer next to you. (For example, a perfect whole brain emulation will always require more clock cycles than a brain actually computes.) Hence, simulating a galaxy requires more than a galaxy’s worth of matter, which is a rather large amount.

The second possibility involves imperfect simulations: those which “lower the resolution” on unimportant parts, like not simulating the motions of every atom in the Sun. These can be further subdivided into simulations where the ‘code’ just runs passively, and simulations where an active effort is made to avoid simulated science discovering the “resolution lowering” phenomenon. Call these Type 2 and Type 3. (Logically, a Type 3b exists, where the simulators deliberately make noticeable interventions like blotting out the Sun at semi-regular intervals. Though noted for completeness, it seems clear we don’t live in one of these.)

In a Type 2 simulation, as science advances far enough, it will almost certainly discover the “resolution lowering”, and conclude the world was being simulated. This possibility is hard to rule out completely, as there will always be one more decimal place to run experiments on. However, the weight of existing science, and the lack of evidence suggesting simulations as a conclusion or even a reasonable hypothesis, is fairly strong evidence against us living in a Type 2.

A Type 3 simulation is one where the simulators actively seek to avoid detection by simulated intelligences. Bostrom’s paper focuses mainly on these, noting that superintelligences could easily fool simulated beings if they so chose. However, this appears to be a form of Giant Cheesecake Fallacy; a superintelligence surely could fool simulated beings, but must also have some motive for doing so. What this motive might be has not, to my knowledge, so far been suggested anywhere (www.simulation-argument.com does not appear to address it).

One might suppose that such a discovery would “disrupt the simulation”, by causing the simulatees to act differently and thereby “ruining the results”. However, any form of resolution lowering would impact the results in some way, and these impacts are not generally predictable by virtue of the halting theorem/grandfather paradox. All one can say is that a given approximation will probably not have a significant impact, for some values of “probably” and “significant”. Hence, we already know the simulators are okay with such disruptions (perhaps below some significance bound).

Would “discovering the simulation” count as a “significant” impact? Here some handwaving is necessary, but it’s worth noting that virtually all civilizations have had some concept of god or gods, and that these concepts varied wildly. Apparently, large variations in civilizational religion did not produce that large a change in civilizational behavior; e.g. medieval Europe was still much more similar to Rome than to imperial China, despite the introduction of Christianity. In addition, the discovery of science which appeared to suggest atheism was in a real sense more disruptive than scientific validation of (some type of) religion would have been.

One might think that, to obtain as much knowledge as possible, simulators would want to try at least some scenarios of Type 3 in addition to Type 2. (This would also apply to the previous post’s arguments, in that the first simulation to allow genocide might be much more informative than the marginal Type 3b simulation which did not allow it.) However, if one accepts that, say, one in one billion simulations are of Type 3 and the rest of Type 2 or Type 3b, one is faced with an anthropic-like dilemma: why are we so special as to live in the only Type 3? Such an observation smacks of Sagan’s invisible dragon.

Finally, one might suppose that simulators live in a universe which allows hyper-computation, beyond ordinary Turing machines, or some other type of exotic physics. Such beings could likely create perfect simulations at trivial cost to themselves; call these Type 4. While theoretically interesting, Bostrom’s Simulation Argument does not extend to cover Type 4s, stating that “Unless we are now living in a simulation, our descendants will almost certainly never run an ancestor-simulation”, which only applies if the simulators and simulatees have comparable laws of physics. Such unobservable ‘super-universes’ might remain forever speculation.

Simulations and the Epicurean Paradox

In a famous paper, Nick Bostrom outlines what he calls the Simulation Argument:

“A technologically mature “posthuman” civilization would have enormous computing power. Based on this empirical fact, the simulation argument shows that at least one of the following propositions is true: (1) The fraction of human-level civilizations that reach a posthuman stage is very close to zero; (2) The fraction of posthuman civilizations that are interested in running ancestor-simulations is very close to zero; (3) The fraction of all people with our kind of experiences that are living in a simulation is very close to one.”

#1 and #2 seem unlikely. #1, because we haven’t found any strong reason to think existential risks are nearly impossible to avoid (see Scott’s Great Filter post). #2, because independent convergence across many possible worlds generally requires world trajectories to be very predictable, and we don’t observe that on Earth. (For example, small timeline changes might have created sentient dolphins, and dolphins have very different drives and moral systems from humans.) Therefore, most attention has focused on option #3.

As Bostrom says:

“In some ways, the posthumans running a simulation are like gods in relation to the people inhabiting the simulation: the posthumans created the world we see; they are of superior intelligence; they are “omnipotent” in the sense that they can interfere in the workings of our world even in ways that violate its physical laws; and they are “omniscient” in the sense that they can monitor everything that happens.”

However, like the gods of mythology, these gods run into what is called the Epicurean paradox, after the Greek philosopher who invented it. The paradox runs:

“Is God willing to prevent evil, but not able? Then he is not omnipotent.
Is he able, but not willing? Then he is malevolent.
Is he both able and willing? Then whence cometh evil?
Is he neither able nor willing? Then why call him God?”

For convenience, let’s number each of these possibilities 1 through 4. For a race of posthuman simulators, we can essentially rule out #1 and #4; they can probably do whatever they please.

The cynically minded might jump to #2 – the “gods” (simulators) are real but malevolent, willingly allowing disease, famine, the Holocaust, and all of humanity’s ills. But this presents another problem. A truly malevolent god could create much more suffering than we actually see. For example, it is widely agreed that electricity makes human life more pleasant. A malevolent simulator could then cause all power plants to be economically impractical, which worsens human life without affecting most other goals it might also be pursuing.

One might then postulate a simulator as merely indifferent to suffering, with goals that are entirely orthogonal. But this too would likely wipe out all the good things in human life, not just a few of them. A full explanation of why would be very lengthy, but Bostrom has described the various issues in his book Superintelligence, coining the terms “perverse instantiation” and “infrastructure profusion” for two of the most serious. In essence, almost all apparently “neutral” goals would, if carried to completion, create a universe morally indistinguishable from one where humanity is extinct. Humanity is not extinct, so we can likely also rule out #2.

For #3, one can try to invent various explanations for why the evil we see is an illusion. Perhaps only you are ‘fully’ simulated, and starving children are merely ‘zombies’ without moral value. Perhaps the simulated world was created recently, and so the world wars and other disasters never really happened. Perhaps the simulator “switches off” consciousness when people are suffering too much. However, all of these suffer from issues of Occam’s Razor: they postulate additional complexity which is inherently unobservable. The problems here are those which cause us to disbelieve the theory of Last Thursdayism, which postulates the universe was created last Thursday, but with memories and other signs of older age already in place.

In fact, observing a large number of ancestor simulations places extremely strong constraints on the goals of the simulator – essentially an even stronger version of the Epicurean paradox, or for that matter of the FAI problem. Solving the FAI problem requires formally specifying a utility function which doesn’t wipe out humanity, a tiny target in a vast space. Creating a simulator requires specifying a utility function which literally never intervenes across a vast variety of simulated situations, a much smaller target still. (One can of course speculate that the simulators intervene and wipe our memories afterwards, or some such, but this shares the problems of Last Thursdayism.)

(Another possibility, more fun to think about, has occurred to me. It seems likely that the space of human values is not large enough to fully satisfy our novelty desires over the next eleven trillion years. Since evolution is the ultimate source of our values, I have wondered if future civilizations might simulate new species evolving to sentience, so as to acquire a richer set of values than they started with. However, on reflection, it is extremely unlikely that an ancestor simulation is the best way to achieve this. Some form of directed evolution, or possibly an even more complex optimization process not yet known to us, would almost certainly be more efficient.)

By itself, this seems to be an argument for Bostrom’s scenario #2. A perfect ancestor simulation, with no intervention by the simulators, requires hitting an extraordinarily small target in utility function space. Hence, it’s not surprising that many different dissimilar worlds failed to hit it, any more than it’s surprising if a thousand gangsters shooting at random fail to hit an acorn seven hundred meters off.

However, believing in Bostrom’s scenario #2 presents a different challenge, outlined by Jaan Tallinn in his talk at the Singularity Summit. If we are not being simulated, then we are some of the very first beings to ever exist, part of the tiny fraction to live before the creation of self-modifying intelligence. The number of beings which might ever exist is truly vast, possibly on the order of 10^70. This creates another conundrum. Why should we be living now? What makes us so privileged?

I have a speculation which addresses this question. Suppose there is a shortage of bread, in your city of two million people. The city government creates a giant queue to buy bread, and assigns each resident a place in it at random. If you are placed at the very head of the line, you would think this demanded explanation; it is very unusual. Perhaps your brother is the mayor, and rigged the lottery. Perhaps you are religious, and prayed very fervently. Something must be going on; conditioning on all your other life experiences, you having this one experience is still very unlikely.

On the other hand, suppose you are graduating from college. Proudly, you walk across the stage, shake the dean’s hand, and receive your diploma. By itself, this is just as unusual as the first scenario. A college education has about two million minutes, of which only one is the one when you receive your degree. Yet, even though you may be very emotional, you don’t see the fact of living out this minute as something that demands explanation. You don’t postulate divine intervention, or an unknown friend in the administration. (Unless, of course, you are a very poor student!)

Even though it is extremely improbable, that one minute where you get your diploma is made logically necessary by the other four years of your education. Conditional on all your other experiences happening, it is extremely likely that you experience this one too; every four year project must have a first minute and a last minute. Therefore, you are not surprised.

Our standard, ancestral view of life sees people as discrete entities. A person is born, lives for a while, and then dies, with inter-human bandwidth of about 300 baud being negligible compared to intra-human bandwidth. One might envision each life as strands of spaghetti, strewn throughout a football field of time. Each strand is distinct, each has a beginning and end, and if you select one at random it is very surprising to pick the first.

However, there is no reason for posthuman civilization to be like this. When humans have a very complex computer program, it is already rare to just throw it out entirely. More likely, one creates a new version, ports it to new hardware, or adapts it for a new purpose (Windows 8’s ancestry goes back thirty-five years, and Linux’s over twenty), because code is easy to copy. In addition, when we do throw code out, almost always the motivation comes from the extremely rapid changes in computers produced by Moore’s Law; it might, for example, be easier to rewrite from scratch than to alter code to handle 100x the previous number of requests. In a world of static computers, such things would become rarer still, and even rarer if one assigned code moral value and the code did not want to ‘die’. Posthuman life would look like a single, continuous river, twisting and branching and growing through the eons.

If we suppose this, then living in the first few years of the river does not seem so surprising; every stream of consciousness must have its beginning, just as every college degree or career or sea voyage must have its first ten seconds, and the existence of the first year is necessarily implied by all of the others. Moreover, if one supposes the posthuman transition (or aging escape velocity) is likely to occur soon, this appears to solve another paradox: why we exist in the year 2014, rather than as one of the hundred billion primitive humans who lived millennia ago. If we are the first generation to be uploaded, then the stream starts with us, rather than all our ancestors who were unlucky enough to have their brains rot in the ground.

Global Warming Numbers

“Across the Narrow Sea, your books are filled with words like “usurper“, and “mad man“, and “blood right“. Here our books are filled with numbers, we prefer the stories they tell. More plain. Less… open to interpretation.” – The Iron Bank of Braavos








(Graphs from Tol, R.S.J. (2005). ‘The marginal damage costs of carbon dioxide emissions: an assessment of the uncertainties’, Energy Policy vol. 33(16), pp. 2064-2074.)

On Privilege

(The author has donned a flameproof asbestos bodysuit.)

I think many objections to the word ‘privilege‘ aren’t about the idea itself – that some types of people, all else being equal, are treated better by society – but about it being applied selectively.

For instance, it’s been very well documented that tall people earn more money and have other social advantages. The same is true for the physically attractive, older siblings, the left handed (and the right handed), Ivy League graduates, married people, and even women with blonde hair; one could go on and on. In fact, there are so many possible types of privilege that everyone is privileged, in some sense of the word.

Thus, I think we can say:

1. Certain types of privilege, like race and gender, are talked about endlessly precisely because they are hot-button flashpoints. This leads to other types of privilege being unfairly ignored, and it also discredits the word by causing it to invariably provoke flamewars.

2. Everyone has some type of privilege, but that doesn’t mean their lives are great, or even that they don’t totally suck. For example, tall privilege is very real, but being a tall subsistence farmer in Ethiopia still totally sucks.

Salary Statistics

My friend Sarah Constantin just wrote a post about wage statistics. These are very important, and she deserves applause for bringing them up. However, they must also be used carefully when deciding what job you should pursue. Let’s explore why, taking doctors as an example.

“What Jobs Pay Best? Doctors. Definitely doctors. The top ten highest mean annual wage occupations are all medical specialties. Anesthesiologists top the list, with an average salary of $235,070…. Bottom line: if you want a high-EV profession, be a doctor.”

Statistically, this is totally true. But if you want to become a doctor, and make that high salary, there’s a catch. In fact, there are a lot of catches. (Wages for any job should be ‘adjusted’ up or down for these factors, but doctors make a good case study.)

The first catch is, of course, medical school. Since you must go there to be a doctor, and it’s very expensive, a doctor doesn’t ‘really’ make the salary listed – part of the money must be used for repaying med school tuition. The same is true for college tuition.

The second catch is usually called ‘opportunity cost’. Consider a cashier vs. a doctor. The cashier makes less, but can start making money at age 16. The doctor doesn’t start until 30, on average – he does a lot of work (in school) for which he isn’t compensated. If people start ‘working’ at age 16 and retire at 65, the cashier is paid for all of those hours, while the doctor is only paid for two thirds. To make it a fair comparison, the doctors’ wages must be ‘spread out’, to cover the years he isn’t paid anything.

The third catch makes the first and second catches a much bigger deal than they seem. That’s discount rates. If I offer you $1 in 2040 in exchange for $1 now, you probably say no. Money in the present is worth more than money in the future, which is why loans charge interest.

For example, $250,000 in med school tuition isn’t that much, spread over a doctor’s career. However, the tuition is due now, while the career earnings may not arrive for thirty or forty years. How much this matters depends on what your ‘effective interest rate’ is. If it’s 5%, then each dollar of med school tuition must be ‘repaid’ by five dollars of additional salary thirty years out, in order to make it a ‘fair’ deal.

There’s also simple number of work hours. If doctors make a lot, but must work eighty hours a week, then a ‘fair’ comparison is to someone who is (eg.) working two forty-hour-a-week cashier jobs.

And finally, we get to more ‘subjective’ (but still very important) factors. Consider IQ. Getting into med school is hard; suppose, picking a number out of a hat, that you must have at least a 115 IQ to enter. If the average manager has a 100 IQ, then if you go into management, you’ll probably make more than the statistical median – the ‘real’ comparison, between doctor and the management job you’d actually have, is less favorable to doctor than it seems. Likewise, if the average physicist has 130 IQ, a doctor going into physics will most likely do worse than median, so the ‘real’ comparison is more favorable to doctor. The same is true for work ethic, charisma, initiative and all the other intangibles.

Summing it all up, one gets a net present value calculation: the EV of a job option is equal to the sum of discounted future cash flows, both positive (like salary) and negative (like college tuition). NPV is how good companies make business decisions, and I think we can do well if we also use it for career decisions.

The Problem With Connection Theory

(For those unfamiliar with Connection Theory, or ‘CT’, check out the Leverage Research website. My link points to the older version, which is somewhat out of date but much more detailed than the current website, which only spends a few paragraphs describing it.)

Science is a difficult project, and even many professional scientists get it wrong. One can make lots of mistakes, while still driving a field of knowledge forward. However, some problems are so serious that they call into question the credibility of the authors as researchers. Much to my regret, Connection Theory appears to have several problems in this class.

Several years ago, an experiment was performed to test Connection Theory. This experiment was done using an “RP test”, a new form of experiment devised by Leverage Research. RP tests are conducted as follows:

“To run an RP test, select a test participant. Make a comprehensive and accurate CT chart of the participant’s mind. Ask the participant which elements of her mind she would like to change. Then use CT, the participant’s chart and facts about the participant’s environment to derive a set of predictions. Each prediction should be of the form “if the participant performs actions A1, A2, A3, etc., in that order, then at time T the participant’s mind will change in way W”, where way W is one of the ways the participant would like her mind to change.

Next, have the participant perform the actions in the order prescribed. Keep track of the results, either by questioning the participant or by having the participant keep a diary. Once the participant stops performing the actions stated in the antecedents of the predictions, either because all of the prescribed actions have been taken or for some other reason, the RP test is complete. Collect all of the results. Be sure to determine which actions were taken at which times and which results occurred or failed to occur at which times. For results that occurred, determine how long results lasted.”

This experimental protocol lacks many common features of scientific research, like randomization, blinding, and control groups. It even lacks a ‘self-control’, common in self-experimentation, where before an experiment one tries something similar without the “key element” (in this case, CT) that one is testing. In fact, it ranks at the bottom of the usual hierarchy of study credibility:

- Randomized controlled double-blind study
– Randomized controlled study, not double-blind
– Case-control study (including twin or sibling)
– Longitudinal or correlational study
– Case series
- Case study (emphasis added)

Of course, in the real world, it isn’t practical to do a fully randomized, controlled, double-blind study every time one does research. There are many hypotheses to test, limited resources, and sometimes one should take ‘shortcuts’ that reduce a study’s power but make it easier to run. However, the paper then goes on to say:

“All of the preceding applies to RP tests conducted with the highest standard of care and rigor. For RP tests conducted less carefully, the resultant evidence for or against CT will be proportionally less strong.”

(Emphasis added.)

So, we are not describing a ‘preliminary study’, to be followed up by a more rigorous test, but an experiment conducted with the “highest standard of care and rigor”.

The paper then continues:

“Determine how unusual it was that the results occurred. If a result, for instance, was the production of a particular action, it is important to note how frequently the participant performed that action prior to the RP test. Was this the first time in five years the participant got herself to perform the action in question? Or did the participant perform the action sporadically in the past?”

ie, instead of a control group, the test must rely on the experimenter’s subjective impression of ‘unusualness’. Again, this is a test to be ‘conducted with the highest standard of care and rigor’.

The paper continues:

“For each prediction recorded as coming true, one should look to see which of these explanations are more elegant and which are less so. The more unusual the effect, the less plausible it will be that it was merely a result of random chance. The more concrete the predicted effect, the less plausible it will be that there was an error in recording the results. Any prediction that actually came true provides some evidence for CT. The more clearly the prediction came true, the better. The less plausible it is that some other factor caused the result, the better. Results that clearly occur, just as predicted, and are not plausibly explained by other factors provide very strong evidence for CT or for a very similar theory.”

(Emphasis added.) Note that this does not (and indeed the full paper does not) describe what criteria a prediction must meet, other than that it must describe a participant’s mind changing in some way. Furthermore, no protocol, method, or control is provided for differentiating a “CT prediction” from a “non-CT prediction”; any prediction made by a “CT researcher” is assumed to be a “CT prediction”. Hence, even an obvious prediction, such as “if Alice spends a day describing the mistakes her co-worker Bob has made during his employment, then Alice will have fewer positive emotions towards Bob”, would be treated as “very strong evidence for CT” under the proposed protocol, despite not being related to CT at all.

The next section describes the real experiment which was conducted. The participants were chosen as follows:

“I have conducted two full RP tests. The first was performed by me on myself. The second was performed by me on a friend and colleague of mine.”

ie., the experiment was conducted with N = 2, and the test subjects were not only not randomly selected, but were active proponents of the theory in question.

Self-experimentation has a long history in medicine, but in such cases, the prediction is usually an objective physical measurement over which the subject has no control (eg. cancer, or stomach inflammation). Attempting to predict one’s own behavior suffers from the obvious confounder that one might act differently in response to predictions. For example, a Muslim trying to prove the existence of God might predict that he works harder on days when he conducts his daily prayers. Of course, he can then make the prediction come true simply by changing how hard he works.

The subjectiveness of the predictions in question, and the assessment thereof, is well demonstrated in the following example:

- “Background: Before doing the RP test, the participant would switch back and forth between feeling like he could be himself while still being accepted and feeling like he needed to be someone he wasn’t. This was a feature of his personality that had been present for ten or more years.
– The required action: The participant must determine a way to achieve nearly universal world virtue that is more effective and feasible than any of the participant’s current ways and does not involve the participant being an approachable example for others to emulate. The participant must also determine a feasible way to search for friends the participant won’t need to change to be friends with. The participant needs to do this while still being virtuous and while retaining the ability to walk away from people.
– The predicted effect: As soon as the participant completes the actions just stated, the participant will at all times feel like he can be himself while still being accepted.
– The result: At some point after the participant completed the above tasks, the participant started to feel at all times like he could be himself while still being accepted. The time the tasks were completed was not recorded carefully. Some time after the tasks were completed, the participant started to feel at all times like he could be himself and still be accepted. The time this effect occurred was also not recorded carefully. Based on the participant’s guesses, the effect occurred less than one month after the above tasks were completed. As of nine weeks later, it still appeared that the effect was persisting. After that week, the participant noticed that the effect has stopped. A CT-compliant explanation of the effect ending was later determined.”

Even given these remarkably fluid and vague methods of prediction assessment, one notes that the test still essentially failed. The participant, by his own admission, did not “at all times feel like he can be himself”, and there is no indication that his behavior after the intervention differed from his prior behavior of “switch[ing] back and forth”, which necessarily implies significant time spent in both states. However, this test was still recorded as a success, as an (unspecified) “CT-compliant explanation” was “later determined”.

The paper goes through another, similar case:

“The participant had been smoking and drinking for approximately 6 years. The participant tried to quit smoking in little pockets here and there, but was not typically not successful. The most successful times were two times when the participant quit smoking for a period of two months each time. During those times, the participant did not want to smoke and did not have cravings. (…)

The predicted effect: As soon as the participant completes all three of these actions, the participant will stop drinking and stop smoking.

The result: The participant did not note when exactly the effects began to occur, but at some point after the completion of the required actions, the participant stopped drinking and smoking. After the participant stopped drinking and smoking, the participant also stopped having a desire to drink or smoke. The participant did not have to struggle with cravings at all and did not have to struggle with the difficulty of getting out of an old routine. The effect persisted for eight weeks. The effects abruptly came to an end when a particular negative event occurred and substantially altered the participant’s life plans. The day after this event, the participant began smoking and drinking again.”

Once again, there is no indication that the CT intervention affected behavior at all. Nevertheless, the paper counts this as a success, justifying:

“One might think that the fact that the participant in the third case started drinking and smoking again weighs against the predictions having been correct. This is not true. The changes caused by RP plans are not supposed to be permanent. Whether they persist or not depends on what other things happen in the person’s mind. Effects are frequently long-lasting since people’s minds do not change so frequently. But it is perfectly possible that some external event would change a person’s mind and reintroduce a mental phenomena we had eliminated.”

Of course, when the phenomenon in question (smoking) was known to be intermittent to begin with, it is not very impressive to predict that it will at some point stop and then at some later point start again. The paper continues:

“In fact, the negative event that immediately preceded the relapse was exactly the sort of event that CT predicts would cause a relapse – it was an event which cut off the new paths to the participant’s IGs that the participant had created through the actions recommended by the recommendation plan. Had I had the prescience to write down the conditions under which the participant would start smoking and drinking again, there would be an additional successful prediction from that RP test.”

Indeed, making predictions is difficult, especially about the future. However, a theory must be “prescient” in order to be scientifically valid. One cannot claim the validity of a theory based on, after a prediction being falsified, a post-hoc claim that the theory ‘would’ have predicted the true result, given some ill-defined counterfactual. Such behavior is the domain of numerology and horoscopes, not science.

Again, I am in favor of “seat-of-the-pants” experiments, and even of using data from such experiments when no better option is available. However, any reasonable researcher must be very mindful of the weaknesses of such methods, and be careful to spell them out.

The paper’s assessment finishes (emphasis added):

“The RP tests provide very strong evidence in favor of CT or a variant of CT. I successfully predicted 16 conceptually independent effects beforehand. In each case, I specified a narrow window of time in which the effect was supposed to occur. In many cases, the predicted effect would break a trend of months or years. And yet I observed the predictions coming true and did not observe any predictions failing. No other existing psychological theory has anything close to this degree of predictive power.

No citations, data, reasoning, or examples of alternative theories are used to justify this extraordinarily sweeping claim, which is founded on two case studies evidently lacking in even the most basic rigor, such as systematic recording of data. To quote Cosma Shalizi:

“Normally, scientific work is full of references to previous works, if only to say things like “the outmoded theory of Jones [1], unable to accommodate stubborn experimental facts [2–25], has generally fallen out of favor”. This is how you indicate what’s new, what you’re relying on, how you let readers immerse themselves in the web of ideas that is an particular field of research. [The author] has deliberately omitted references. Now, this is sometimes done: Darwin did it in The Origin of Species, for instance, to try to get it to press quickly. But [the author] has written 1100 pages over about a decade; what would it have hurt to have included citations? (…) To acknowledge that he had predecessors who were not universally blinkered fools would, however, conflict with the persona he tries to project to others, and perhaps to himself.”

Connection Theory is not quite as well-researched as the book Shalizi discusses. Nevertheless, in the 80-page essay “Connection Theory: Theory and Practice“, the author neglects to include any citations or bibliography; despite being an experimental paper on a theory of psychology, the paper discussed displays no understanding of the existing psychological literature or of established research methodology. As many psychologists will admit, these are often highly flawed, but (as Shalizi points out) the correct response is to explain why they are flawed and how they can be improved. A middle school science student is not excused from citing Newton in a paper on gravity, merely because the ‘wrong’ Newton was later overturned by Einstein.

If Leverage Research had merely reported what they had done, without any commentary, it would just have been another mildly interesting self-report on exploring psychology, as the tens of thousands of others which have been collected online. And of course, any good scientist will spend some time thinking over speculative theories with little evidence. But to make the extraordinary claims described above, and to use Connection Theory as one of the primary foundations of an organization with twenty-six team members as of 2012 (edit: Leverage has clarified that they have 15 full-time employees, and that the majority of the 26 mentioned here were volunteers), is to hold yourself to a higher, not lower, standard than commonly employed in sixth grade science fairs.

Science is a difficult project. Even when conducted in university laboratories with large grants, 80% of non-randomized medical studies turn out to be false, and even 25% of “gold-standard randomized trials” turn out to be false. And this is in the field of ‘hard’ medical studies, with concrete observables such as cancer and heart disease. In psychology, one often finds that:

“Social psychology experiments in the laboratory tend to throw up spectacular mind-boggling effects. Many of these fail to replicate and are later discredited. The ones that do replicate are not always generalizable – sometimes an even slightly different situation will remove the effect or create exactly the opposite effect. The effects that remain robust in the laboratory may be too short-lasting or too specific to have any importance in real life. And the ones that do matter in real life may respond unpredictably or even paradoxically to attempts to control them.”

Some of this difficulty is unavoidable. But we can at least be aware of these problems, and do what we can to not make them worse.


Get every new post delivered to your Inbox.

Join 58 other followers