domenica 6 novembre 2011

Experiments and Expectations


Performing an experiment amounts to asking the Universe a question. For the answer, the experimental results, to be of any use, you have to be absolutely sure you've phrased the question correctly. When searching for elusive effects among a sea of random events by statistical means, whether in particle physics or parapsychology, one must take care to apply statistics properly to the events being studied. Misinterpreting genuine experimental results yields errors just as serious as those due to faults in the design of the experiment.

Evidence for the existence of a phenomenon must be significant, persistent, and consistent. Statistical analysis can never entirely rule out the possibility that the results of an experiment were entirely due to chance—it can only calculate the probability of occurrence by chance. Only as more and more experiments are performed, which reproduce the supposed effect and, by doing so, further decrease the probability of chance, does the evidence for the effect become persuasive.

To show how essential it is to ask the right question, consider an experiment in which the subject attempts to influence a device which generates random digits from 0 to 9 so that more nines are generated than expected by chance. Each experiment involves generation of one thousand random digits. We run the first experiment and get the following result:

51866599999944246273322297520235159670265786865762
83920280286669261956417760577150505375232340788171
46551582885930322064856497482202377988394190796683
36456436836261696793837370744918197880364326675561
63557778635028561881986962010589509079505189756148
71722255387675577937985730026325400514696800042830
49134963200862681703176774115437941755363670637799
08279963556956436572800286835535562483733337524409
90735067709628443287363500729444640394058938260556
35615446832321914949835991535024593960198026143550
34915341561413975080553492042984685869042671369729
59432799270157302860632198198519187171162147204313
26736371990032510981560378617615838239495314260376
28555369005714414623002367202494786935979014596272
75647327983564900896013913125375709712947237682165
84273385694198868267789456099371827798546039550481
93966363733020953807261965658687028741391908959254
79109139065222171490342469937003707021339710682734
97173738046984452113756225260095828324586288486644
14887777251716547950457638477301077505585332159232


The digit frequencies from this run are:


Digit Occurrences
0 94
1 81
2 96
3 111
4 84
5 111
6 111
7 112
8 91
9 109

There's no obvious evidence for a significant excess of nines here (we'll see how to calculate this numerically before long). There was an excess of nines over the chance expectation, 100, but greater excesses occurred for the digits 3, 5, 6, and 7. But take a look at the first line of the results!

51866599999944246273322297520235159670265786865762
.
.
.
These digits were supposed to be random, yet in the first thousand, the first dozen for that matter, we found a pattern as striking as “999999”. What's the probability of that happening? Just the number of possible numbers of d digits which contain one or more sequences of p or more consecutive nines:



Plugging in 1000 for d and 6 for p yields:



So the probability of finding “999999” in a set of 1000 random digits is less than one in a thousand! So then, are the digits not random, after all? Might our subject, while failing to influence the outcome of the experiment in the way we've requested, have somehow marked the results with a signature of a thousand-to-one probability of appearing by chance? Or have we simply asked the wrong question and gotten a perfectly accurate answer that doesn't mean what we think it does at first glance?

The latter turns out to be the case. The data are right before our eyes, and the probability we calculated is correct, but we asked the wrong question, and in doing so fell into a trap littered with the bones of many a naïve researcher. Note the order in which we did things. We ran the experiment, examined the data, found something seemingly odd in it, then calculated the probability of that particular oddity appearing by chance. We asked the question, “What is the probability of ‘999999’ appearing in a 1000 digit random sequence?” and got the answer “less than one in a thousand”, a result most people would consider significant. But since we calculated the probability after seeing the data, in fact we were asking the question “What is the chance that ‘999999’ appears in a 1000 digit random sequence which contains one occurrence of ‘999999’?”. The answer to that question is, of course, “certainty”.

In the original examination of the data, we were really asking “What is the probability we'll find some striking sequence of six digits in a random 1000 digit number?”. We can't precisely quantify that without defining what “striking” means to the observer, but it is clearly quite high. Consider that I could have made the case just as strongly for “000000”, “777777” or any other six-digit repeat. That alone reduces the probability of occurrence by chance to one in ten. Or, perhaps I might have pointed out a run of digits like “123456”, “012345”, “987654”, and so on; or the first five or six digits of a mathematical constant such as Pi, e, or the square root of two; regular patterns like “101010”, “123321”, or a multitude of others; or maybe my telephone or license plate number, or the subject's! It is, in fact, very likely you'll find some pattern you consider striking in a random 1000-digit number.

But, of course, if you don't examine the data from an experiment, how are you going to notice if there's something odd about it? Now we'll see how a hypothesis is framed, tested by a series of experiments, and confirmed or rejected by statistical analysis of the results. So, let's pursue this a bit further, exploring how we frame a hypothesis based on an observation, run experiments to test it, and then analyse the results to determine whether they confirm or deny the hypothesis, and to what degree of certainty.

Our observation, based on examining the first thousand random digits, is that “999999” appears once, while the probability of “999999” appearing in a randomly chosen 1000 digit number is less than one in a thousand. Based on this observation we then suggest:


Hypothesis: The sequence “999999” appears more frequently in 1000-digit sequences with the subject attempting to influence the generator than would be expected by chance.


We can now proceed to test this experimentally. If the sequence “999999” has a probability of occurring in a 1000 digit sequence of 0.000995, then for a thousand consecutive 1000 digit sequences (a million digits in all), the probability of “999999” appearing will be 0.995, almost unity. (To be correct, it's important to test each 1000 digit sequence separately, then sum the results for 1000 consecutive sequences. If we were to scan all million digits as one sequence, we would count cases where the sequence “999999” begins in one 1000 digit sequence and ends in the next. The probability (which you can calculate from the equation above) of finding “999999” in a million digit sequence is 0.999995, somewhat higher than the 0.995 with the million digits are treated as separate 1000 digit experiments.)

We will perform, then, the following experiment. With our ever-patient subject continuing to attempt to influence the output of the generator, we will produce a million more sequences of 1000 digits and, in each, count occurrences of “999999”. Every 1000 sequences, we'll record the number of occurrences, repeating the process until we've generated a thousand runs of a million digits—109 digits in all. With that data in hand, we'll see whether the “999999 effect” is genuine or a fluke attributable to chance.

Here is a plot of the number of occurrences of the sequence “999999” per block of 1000 digits over the thousand repetitions of the thousand sequence experiment. The number of occurrences expected by chance, 0.995, is marked by the green line.



At the outset, the results diverged substantially from chance, as is frequently the case for small sample sizes. But as the number of experiments increased, the results converged toward the chance expectation, ending up in a decreasing magnitude random walk around it. This is precisely what is expected from probability theory, and hence we conclude no “999999 effect” exists.

fourmilab.ch

Nessun commento: