We are in the likely beginning of the biggest world health crisis of my lifetime. A new virus, Coronavirus-19, is spreading throughout the world. Accurate, good information is critical. In this morning’s NYT, there was an article by a researcher from the University of Oregon. The writing style was not bad for this kind of article routinely featured in NYT – starts out with a question put as a non-expert would grasp, the comes paragraph 2 or 3 ‘Our research shows…’ – nearly a universal feature in these articles and for some reason, one that annoys me. Setting that aside, she describes a study she/they conducted looking at 1263 individuals and seeing what the traits are of those who ‘stalk statistics’ vs those who don’t.

I have multiple issues with the article – some of them addressed by the author but at the tail end and in a very ‘hand-wavy’ way. Given the context of the Covid19 pandemic, I’m more than a little troubled by the article as it clearly promotes viewing people who pay attention to ‘statistics’ about the evolution of the pandemic as pathological in some way. It’s not too far a stretch from saying “Don’t Worry, be Happy”.

The first minor issue: The first observation is that ‘males are more likely to be stalkers than females’ by 55% vs 43% (presumably a small number were ambiguous). When you say ‘more likely’ you are making a probabilistic statement, implying a level of generalization to the population based on your sample. If she had said men in the sample were observed to be stalkers more often, that would be fine. But that’s not what was said. Accuracy in words matter – especially for people who supposedly have expertise. Now let’s see if she can even back up the ‘more likely’ claim. If the sample was roughly 50/50 Male/Female, then that’s 625 each. If there’s not a priori predeliction to stalk or not, then you expect 312. For a binomial distribution, that is you expect 50% with a statistical standard deviation of 0.5*sqrt(312) or around 3%. Thus, you expect 50+- 3%. For males you observe 55, easily consistent with 50/50. Now, is 43% consisten? Writing this, it’s clear I need to do my own work here too. The relevant question is whether 43% of 312 is statistically different from 55% in another sample of 312. My guess is yes. Let’s say the null hypothesis is the mean (or 49%). So we are comparing two measurements, one is a 2 sigma fluctuation up (49->55) and one is a 2 sigma fluctuation down (49->43). Each has a probability of around 2% so the overall probability is likely around 0.04%. So I have to retract my first statement that this is not significant.