(For those unfamiliar with Connection Theory, or ‘CT’, check out the Leverage Research website. My link points to the older version, which is somewhat out of date but much more detailed than the current website, which only spends a few paragraphs describing it.)
Science is a difficult project, and even many professional scientists get it wrong. One can make lots of mistakes, while still driving a field of knowledge forward. However, some problems are so serious that they call into question the credibility of the authors as researchers. Much to my regret, Connection Theory appears to have several problems in this class.
Several years ago, an experiment was performed to test Connection Theory. This experiment was done using an “RP test”, a new form of experiment devised by Leverage Research. RP tests are conducted as follows:
“To run an RP test, select a test participant. Make a comprehensive and accurate CT chart of the participant’s mind. Ask the participant which elements of her mind she would like to change. Then use CT, the participant’s chart and facts about the participant’s environment to derive a set of predictions. Each prediction should be of the form “if the participant performs actions A1, A2, A3, etc., in that order, then at time T the participant’s mind will change in way W”, where way W is one of the ways the participant would like her mind to change.
Next, have the participant perform the actions in the order prescribed. Keep track of the results, either by questioning the participant or by having the participant keep a diary. Once the participant stops performing the actions stated in the antecedents of the predictions, either because all of the prescribed actions have been taken or for some other reason, the RP test is complete. Collect all of the results. Be sure to determine which actions were taken at which times and which results occurred or failed to occur at which times. For results that occurred, determine how long results lasted.”
This experimental protocol lacks many common features of scientific research, like randomization, blinding, and control groups. It even lacks a ‘self-control’, common in self-experimentation, where before an experiment one tries something similar without the “key element” (in this case, CT) that one is testing. In fact, it ranks at the bottom of the usual hierarchy of study credibility:
- Randomized controlled double-blind study
– Randomized controlled study, not double-blind
– Case-control study (including twin or sibling)
– Longitudinal or correlational study
– Case series
- Case study (emphasis added)
Of course, in the real world, it isn’t practical to do a fully randomized, controlled, double-blind study every time one does research. There are many hypotheses to test, limited resources, and sometimes one should take ‘shortcuts’ that reduce a study’s power but make it easier to run. However, the paper then goes on to say:
“All of the preceding applies to RP tests conducted with the highest standard of care and rigor. For RP tests conducted less carefully, the resultant evidence for or against CT will be proportionally less strong.”
So, we are not describing a ‘preliminary study’, to be followed up by a more rigorous test, but an experiment conducted with the “highest standard of care and rigor”.
The paper then continues:
“Determine how unusual it was that the results occurred. If a result, for instance, was the production of a particular action, it is important to note how frequently the participant performed that action prior to the RP test. Was this the first time in five years the participant got herself to perform the action in question? Or did the participant perform the action sporadically in the past?”
ie, instead of a control group, the test must rely on the experimenter’s subjective impression of ‘unusualness’. Again, this is a test to be ‘conducted with the highest standard of care and rigor’.
The paper continues:
“For each prediction recorded as coming true, one should look to see which of these explanations are more elegant and which are less so. The more unusual the effect, the less plausible it will be that it was merely a result of random chance. The more concrete the predicted effect, the less plausible it will be that there was an error in recording the results. Any prediction that actually came true provides some evidence for CT. The more clearly the prediction came true, the better. The less plausible it is that some other factor caused the result, the better. Results that clearly occur, just as predicted, and are not plausibly explained by other factors provide very strong evidence for CT or for a very similar theory.”
(Emphasis added.) Note that this does not (and indeed the full paper does not) describe what criteria a prediction must meet, other than that it must describe a participant’s mind changing in some way. Furthermore, no protocol, method, or control is provided for differentiating a “CT prediction” from a “non-CT prediction”; any prediction made by a “CT researcher” is assumed to be a “CT prediction”. Hence, even an obvious prediction, such as “if Alice spends a day describing the mistakes her co-worker Bob has made during his employment, then Alice will have fewer positive emotions towards Bob”, would be treated as “very strong evidence for CT” under the proposed protocol, despite not being related to CT at all.
The next section describes the real experiment which was conducted. The participants were chosen as follows:
“I have conducted two full RP tests. The first was performed by me on myself. The second was performed by me on a friend and colleague of mine.”
ie., the experiment was conducted with N = 2, and the test subjects were not only not randomly selected, but were active proponents of the theory in question.
Self-experimentation has a long history in medicine, but in such cases, the prediction is usually an objective physical measurement over which the subject has no control (eg. cancer, or stomach inflammation). Attempting to predict one’s own behavior suffers from the obvious confounder that one might act differently in response to predictions. For example, a Muslim trying to prove the existence of God might predict that he works harder on days when he conducts his daily prayers. Of course, he can then make the prediction come true simply by changing how hard he works.
The subjectiveness of the predictions in question, and the assessment thereof, is well demonstrated in the following example:
- “Background: Before doing the RP test, the participant would switch back and forth between feeling like he could be himself while still being accepted and feeling like he needed to be someone he wasn’t. This was a feature of his personality that had been present for ten or more years.
– The required action: The participant must determine a way to achieve nearly universal world virtue that is more effective and feasible than any of the participant’s current ways and does not involve the participant being an approachable example for others to emulate. The participant must also determine a feasible way to search for friends the participant won’t need to change to be friends with. The participant needs to do this while still being virtuous and while retaining the ability to walk away from people.
– The predicted effect: As soon as the participant completes the actions just stated, the participant will at all times feel like he can be himself while still being accepted.
– The result: At some point after the participant completed the above tasks, the participant started to feel at all times like he could be himself while still being accepted. The time the tasks were completed was not recorded carefully. Some time after the tasks were completed, the participant started to feel at all times like he could be himself and still be accepted. The time this effect occurred was also not recorded carefully. Based on the participant’s guesses, the effect occurred less than one month after the above tasks were completed. As of nine weeks later, it still appeared that the effect was persisting. After that week, the participant noticed that the effect has stopped. A CT-compliant explanation of the effect ending was later determined.”
Even given these remarkably fluid and vague methods of prediction assessment, one notes that the test still essentially failed. The participant, by his own admission, did not “at all times feel like he can be himself”, and there is no indication that his behavior after the intervention differed from his prior behavior of “switch[ing] back and forth”, which necessarily implies significant time spent in both states. However, this test was still recorded as a success, as an (unspecified) “CT-compliant explanation” was “later determined”.
The paper goes through another, similar case:
“The participant had been smoking and drinking for approximately 6 years. The participant tried to quit smoking in little pockets here and there, but was not typically not successful. The most successful times were two times when the participant quit smoking for a period of two months each time. During those times, the participant did not want to smoke and did not have cravings. (…)
The predicted effect: As soon as the participant completes all three of these actions, the participant will stop drinking and stop smoking.
The result: The participant did not note when exactly the effects began to occur, but at some point after the completion of the required actions, the participant stopped drinking and smoking. After the participant stopped drinking and smoking, the participant also stopped having a desire to drink or smoke. The participant did not have to struggle with cravings at all and did not have to struggle with the difficulty of getting out of an old routine. The effect persisted for eight weeks. The effects abruptly came to an end when a particular negative event occurred and substantially altered the participant’s life plans. The day after this event, the participant began smoking and drinking again.”
Once again, there is no indication that the CT intervention affected behavior at all. Nevertheless, the paper counts this as a success, justifying:
“One might think that the fact that the participant in the third case started drinking and smoking again weighs against the predictions having been correct. This is not true. The changes caused by RP plans are not supposed to be permanent. Whether they persist or not depends on what other things happen in the person’s mind. Effects are frequently long-lasting since people’s minds do not change so frequently. But it is perfectly possible that some external event would change a person’s mind and reintroduce a mental phenomena we had eliminated.”
Of course, when the phenomenon in question (smoking) was known to be intermittent to begin with, it is not very impressive to predict that it will at some point stop and then at some later point start again. The paper continues:
“In fact, the negative event that immediately preceded the relapse was exactly the sort of event that CT predicts would cause a relapse – it was an event which cut off the new paths to the participant’s IGs that the participant had created through the actions recommended by the recommendation plan. Had I had the prescience to write down the conditions under which the participant would start smoking and drinking again, there would be an additional successful prediction from that RP test.”
Indeed, making predictions is difficult, especially about the future. However, a theory must be “prescient” in order to be scientifically valid. One cannot claim the validity of a theory based on, after a prediction being falsified, a post-hoc claim that the theory ‘would’ have predicted the true result, given some ill-defined counterfactual. Such behavior is the domain of numerology and horoscopes, not science.
Again, I am in favor of “seat-of-the-pants” experiments, and even of using data from such experiments when no better option is available. However, any reasonable researcher must be very mindful of the weaknesses of such methods, and be careful to spell them out.
The paper’s assessment finishes (emphasis added):
“The RP tests provide very strong evidence in favor of CT or a variant of CT. I successfully predicted 16 conceptually independent effects beforehand. In each case, I specified a narrow window of time in which the effect was supposed to occur. In many cases, the predicted effect would break a trend of months or years. And yet I observed the predictions coming true and did not observe any predictions failing. No other existing psychological theory has anything close to this degree of predictive power.“
No citations, data, reasoning, or examples of alternative theories are used to justify this extraordinarily sweeping claim, which is founded on two case studies evidently lacking in even the most basic rigor, such as systematic recording of data. To quote Cosma Shalizi:
“Normally, scientific work is full of references to previous works, if only to say things like “the outmoded theory of Jones , unable to accommodate stubborn experimental facts [2–25], has generally fallen out of favor”. This is how you indicate what’s new, what you’re relying on, how you let readers immerse themselves in the web of ideas that is an particular field of research. [The author] has deliberately omitted references. Now, this is sometimes done: Darwin did it in The Origin of Species, for instance, to try to get it to press quickly. But [the author] has written 1100 pages over about a decade; what would it have hurt to have included citations? (…) To acknowledge that he had predecessors who were not universally blinkered fools would, however, conflict with the persona he tries to project to others, and perhaps to himself.”
Connection Theory is not quite as well-researched as the book Shalizi discusses. Nevertheless, in the 80-page essay “Connection Theory: Theory and Practice“, the author neglects to include any citations or bibliography; despite being an experimental paper on a theory of psychology, the paper discussed displays no understanding of the existing psychological literature or of established research methodology. As many psychologists will admit, these are often highly flawed, but (as Shalizi points out) the correct response is to explain why they are flawed and how they can be improved. A middle school science student is not excused from citing Newton in a paper on gravity, merely because the ‘wrong’ Newton was later overturned by Einstein.
If Leverage Research had merely reported what they had done, without any commentary, it would just have been another mildly interesting self-report on exploring psychology, as the tens of thousands of others which have been collected online. And of course, any good scientist will spend some time thinking over speculative theories with little evidence. But to make the extraordinary claims described above, and to use Connection Theory as one of the primary foundations of an organization with twenty-six team members as of 2012 (edit: Leverage has clarified that they have 15 full-time employees, and that the majority of the 26 mentioned here were volunteers), is to hold yourself to a higher, not lower, standard than commonly employed in sixth grade science fairs.
Science is a difficult project. Even when conducted in university laboratories with large grants, 80% of non-randomized medical studies turn out to be false, and even 25% of “gold-standard randomized trials” turn out to be false. And this is in the field of ‘hard’ medical studies, with concrete observables such as cancer and heart disease. In psychology, one often finds that:
“Social psychology experiments in the laboratory tend to throw up spectacular mind-boggling effects. Many of these fail to replicate and are later discredited. The ones that do replicate are not always generalizable – sometimes an even slightly different situation will remove the effect or create exactly the opposite effect. The effects that remain robust in the laboratory may be too short-lasting or too specific to have any importance in real life. And the ones that do matter in real life may respond unpredictably or even paradoxically to attempts to control them.”
Some of this difficulty is unavoidable. But we can at least be aware of these problems, and do what we can to not make them worse.