Fascinating YouTube channels

Thought Emporium: New open-source science every week

Applied Science: Interesting applications of science and technology

Foresight Institute: Guiding technological development towards an abundant future

Numberphile: Videos about numbers

NileRed: Show the natural beauty of chemistry in fun ways

TIK: World War 2 history documentaries

TierZoo: Analyzing biology and ecology as the world’s biggest, oldest video game

Scott Manley: Orbital mechanics and rocket science

3 Blue 1 Brown: Some combination of math and entertainment

A Comprehensive Reboot of Law Enforcement

Eliezer Yudkowsky’s proposal for police and law enforcement reform. I haven’t analyzed every bit of this, but it seems like the best overall plan discussed so far.


Fun With Pyro: Negative Evidence

Previously: Fun With Pyro

In a typical clinical trial, the difference between the treatment group and the control group is measured with p-values. If the p-value for the “primary endpoint” is less than .05, the effect is said to be “statistically significant”; if not, then no effect has been found. There are lots of issues with this, but one of them is conflating not finding any evidence with finding negative evidence (evidence against the theory being tested). A small, poorly-controlled, or badly measured study can’t tell us anything either way, but a large, well-done study can provide strong negative evidence – “no evidence for X” is not the same thing as “evidence for no X”. With Pyro, this can be measured in a clean. quantitative way.

To illustrate, we can run simulations using the last post’s model of the Pikachu virus, only this time, the simulated treatment and control data will all be randomly drawn from the same distribution. Since there is no difference in the data, on average, the probability we assign to “no effect” should increase from the baseline 80% after a trial is run, and it should increase more with larger trials that provide stronger negative evidence. However, this is only an average. Sometimes, by luck, the data drawn will make it look like there is an effect, and p(no effect) will actually go down. To smooth this out, we can simulate a large number of trials and average them, but running lots and lots of simulations would take a long time. Luckily, PyTorch’s batch_shape feature and the efficiency of modern SIMD architectures makes it (relatively) easy to run them all in parallel:

def mixture_n(which_world, dists):
  Generalizes MaskedMixture to n > 2 components.
  assert dists
  if len(dists) == 1: return dists[0]
  assert which_world.max() < len(dists)

  return reduce(lambda a, b: dist.MaskedMixture(
    which_world == b[0], a[1] if isinstance(a, tuple) else a, b[1]),

def model(patients):
  patients = patients.clone()
  scaling = common.make_tensor(100.0)
  patients /= scaling
  n_trials = patients.size()[2]

  p_effect_sizes = common.make_tensor(PRIORS['p_effect_sizes'], n_trials)
  mode_effect_size = common.make_tensor(PRIORS['mode_effect_size'], n_trials)
  effect_k = common.make_tensor(PRIORS['effect_k'], n_trials)
  small_effect_k = common.make_tensor(PRIORS['small_effect_k'], n_trials)
  scale = common.make_tensor(PRIORS['scale']) / scaling

  distributions = make_distributions(mode_effect_size, effect_k, small_effect_k)
  with pyro.plate("world"):
    which_world = pyro.sample("which_world", dist.Categorical(p_effect_sizes))

    effect_size_raw = pyro.sample(
      "effect_size", common.mixture_n(which_world, distributions))
  effect_size = effect_size_raw * scale

  mode0 = common.make_tensor(0.5, n_trials)
  k0 = common.make_tensor(8.1269, n_trials)

  # beta reparameterized
  # https://bit.ly/2ZqjILG
  original_dist = common.beta(mode0, k0)
  new_dist = common.beta(mode0 - effect_size, k0)

  with pyro.plate("trial"):
    with pyro.plate("trial2"):
      pyro.sample("severity_trial", new_dist, obs=patients[1])

  with pyro.plate("test1"):
    with pyro.plate("test2"):
      pyro.sample("severity_test", original_dist, obs=patients[0])

We can then graph out the strength of negative evidence, averaged over a thousand simulated trials, for a range of different possible sample sizes:

(Note that the prior is a bit under 83% here, as this graph includes all probability density with effect size < 1. This is an average of probabilities rather than an average of log-odds, which might have been more representative.)

This is also a good scalability test; my desktop got up to one million data points before the simulation even started to slow down. At that point, I switched to CUDA, which got up to ten million data points before fully stressing the GPU.

Fun With Pyro

A few weeks ago, a randomized trial of remdesivir showed that it seemed to reduce mortality risk for coronavirus patients, from 11.6% to 8%. The p-value for this comparison was p=0.059. But what does that really mean? Does remdesivir work, or not? How likely is it to work? 94.1%?

Pyro is a powerful probabilistic programming library based on PyTorch that can answer questions like this, and much, much more. To start off, consider a hypothetical disease, the Pikachu virus. We can create a disease severity scale to measure how bad each case of Pikachu is, from 0 being no symptoms at all, to 100 being severely ill. Let’s assume the patients we get follow a beta distribution (rescaled), with the center at 50 and a standard deviation of 20. We can then simulate groups of patients getting sick:

def pikachu_patients(n):
  mode0 = common.make_tensor(0.5)
  k0 = common.make_tensor(8.1269)
  scaling = common.make_tensor(100.0)

  # beta reparameterized
  # https://bit.ly/2ZqjILG
  patient_dist = common.beta(mode0, k0)

  patients = []
  for i in range(n):
    patient = pyro.sample("severity", patient_dist)
    patients.append(patient * scaling)

  return torch.stack(patients)

(This is, of course, a very simplified model! More elaborate and realistic models will be explored in later posts.)

This gives us a handy distribution of data points to play with (n = 50):

Suppose someone runs a two-arm, n=100, randomized controlled trial for the use of rubber gloves in treating Pikachu. The data shown above is the non-treatment group, with a nice mean severity of 49.052, while the treatment group looks like this:

The mean severity of this group is 42.6573, p=0.0423, which is less than the standard 0.05. The treatment works! (In fact, xkcd-style, this was made by just running the sampler a bunch of times until something suitable came out.) But how likely is it to work, really? And with what effect size?

Before the trial starts, we already know that most treatments don’t work. Let’s assume that there’s an 80% chance of no effect (zero distribution), a 10% chance of a small effect (beta distribution, mode = 0, a = 1, b = 9, rescaled to range 0-30), and a 10% chance of a significant effect of unknown size (beta distribution, mode = 0.5, a = 1, b = 1, rescaled):

  'p_effect_sizes': [0.8, 0.1, 0.1],
  'mode_effect_size': 0.5, # before scaling
  'effect_k': 2.0, # uniform distribution
  'small_effect_k': 10.0,
  'scale': 30.0

Our prior distribution looks like this:

Now, we make a simulation of what might happen to patients under different possible outcomes, and match that against the observed “trial data”:

def make_distributions(mode_effect_size, effect_k, small_effect_k):
  return [common.zero_dist(),
          common.beta(0, small_effect_k),
          common.beta(mode_effect_size, effect_k)]

def model(patients):
  patients = patients.clone()
  scaling = common.make_tensor(100.0)
  patients /= scaling

  p_effect_sizes = common.make_tensor(PRIORS['p_effect_sizes'])
  mode_effect_size = common.make_tensor(PRIORS['mode_effect_size'])
  effect_k = common.make_tensor(PRIORS['effect_k'])
  small_effect_k = common.make_tensor(PRIORS['small_effect_k'])
  scale = common.make_tensor(PRIORS['scale']) / scaling

  distributions = make_distributions(mode_effect_size, effect_k, small_effect_k)
  which_world = pyro.sample("which_world", dist.Categorical(p_effect_sizes))
  effect_size_raw = pyro.sample("effect_size", distributions[which_world])
  effect_size = effect_size_raw * scale

  mode0 = common.make_tensor(0.5)
  k0 = common.make_tensor(8.1269)

  # beta reparameterized
  # https://bit.ly/2ZqjILGoriginal
  original_dist = common.beta(mode0, k0)
  new_dist = common.beta(mode0 - effect_size, k0)

  with pyro.plate("trial", len(patients[1])):
    pyro.sample("severity_trial", new_dist,

  with pyro.plate("test", len(patients[0])):
    pyro.sample("severity_test", original_dist,

(Note that in this simple model, since we already know the underlying distribution given no intervention, the “control” data has no effect on the findings. In real life, of course, control data would be used to find other parameters.)

We then create a “variational distribution”, or “guide”, to approximate the posterior:

def guide(patients):
  p_effect_sizes = pyro.param(
    "p_effect_sizes", common.make_tensor(PRIORS['p_effect_sizes']),

  mode_effect_size = pyro.param(
    "mode_effect_size", common.make_tensor(PRIORS['mode_effect_size']),
    constraint=constraints.interval(0, 1))

  effect_k = pyro.param(
    "effect_k", common.make_tensor(PRIORS['effect_k']),

  small_effect_k = pyro.param(
    "small_effect_k", common.make_tensor(PRIORS['small_effect_k']),

  distributions = make_distributions(mode_effect_size, effect_k, small_effect_k)
  which_world = pyro.sample("which_world", dist.Categorical(p_effect_sizes))
  effect_size_raw = pyro.sample("effect_size", distributions[which_world])

And run some inference:

def find_params(model, guide, data, steps, callable_log):
  # setup the optimizer
  adam_params = {"lr": 0.002, "betas": (0.90, 0.999)}
  scheduler = MultiStepLR(
    {'optimizer': Adam, 'optim_args': adam_params,
     'milestones': [100, 200], 'gamma': 0.2})

  # setup the inference algorithm
  svi = SVI(model, guide, scheduler, loss=Trace_ELBO())

  logger.info("Begin gradient descent")
  # do gradient steps
  for step in progressbar.progressbar(range(steps)):
    loss = svi.step(data)
    if step % 1000 == 0:

We get an 18.74% chance of no effect, a 26.45% chance of a small effect, and a 54.82% chance of a significant effect. We also have a much clearer idea of how large a “significant” effect would be. Here’s the graph:

So we have a reasonable guess as to what the effect is, but it could still be a coincidence. This matches intuitions from eyeballing the data pretty well.

It’ll be cool to extend this simplified model, but one issue to be aware of with Pyro is that if the prior doesn’t include an outcome, it’s effectively assumed to be impossible, no matter what the data shows. The prior here has no term for adverse effects, for example. So even if the data was very negative, the posterior would never tell you that the treatment did harm (although it would have a very low probability for beneficial effects). On the flip side, PyTorch is so scalable that it should be straightforward to create more complex models with very large numbers of variables.

Lists of Coronavirus Efforts

Tests being developed: https://www.360dx.com/coronavirus-test-tracker-launched-covid-19-tests

Vaccines and therapies being developed: https://www.biocentury.com/article/304515

Clinical trials, 736 in the current database as of Apr. 8th: https://www.transparimed.org/single-post/2020/03/27/COVID-19-clinical-trials-information-sources

PPE projects (masks, face shields, etc.): https://srconstantin.github.io/2020/04/03/

Coronavirus is Here

A guide by my friend Zvi Mowshowitz:


Kelsey Piper on preparing for coronavirus


Peace Deals Signed in Afghanistan, South Sudan



Professional Kidnappers

“One night, they’re sleeping in their beds, and a group of strangers – muscle-bound, big bouncer-type strangers with handcuffs – burst into their bedroom, physically pull them out of their bed, and put them in an unmarked van. Unmarked because they’re not cops, they’re just professional kidnappers, basically. Like the European guys from Taken, if those European kidnappers had a website. Because yeah, there’s an actual website that you can check out: safeyouth.com, where it essentially says: hey, we’ll take your kids against their will, just don’t tell them we’re coming. Because that makes our jobs easier. We like to use the elements of surprise, and disorientation.” http://www.cracked.com/podcast/the-horrifying-reality-teen-rehab-centers/

Treating Transgender Youth

Interesting comments from Dr. Will Powers, who specializes in trans medicine:

“Approximately 20% of my transgender women (Male to Female) and 85% of my transgender men (Female to Male) have considerable hormone abnormalities which deviate more than 95% from “average” values.

20% MtF (Elevated Estrone/Estradiol/Free Estradiol/SHBG, Low FSH/LH, very low T/DHT)

85% FtM (Massively elevated androgens in one or more categories, Androstenedione, Androstanediol, Androsterone, DHEA-S, Testosterone or DHT (Or a normal testosterone with a massively elevated free T due to a near lack of SHBG)”


“I have lots of kids come to me that have hormonal disruption that I fix and then they no longer feel trans. This is most commonly done with my patients in FtMs. They come in with some insane androgen level, I fix it with drugs, and then they no longer feel like they want to transition.

I have never seen this work in someone over the age of twenty-five, but the younger they are, the more likely they seem to change their position on taking hormones once the underlying dysfunction is corrected.”