Keywords: the distributio
Description: In the picture above are simultaneously portrayed several Poisson distributions. Where the rate of occurrence of some event, r (in this chart called lambda or l ) is small, the range of likely
In the picture above are simultaneously portrayed several Poisson distributions. Where the rate of occurrence of some event, r (in this chart called lambda or l ) is small, the range of likely possibilities will lie near the zero line. Meaning that when the rate r is small, zero is a very likely number to get. As the rate becomes higher (as the occurrence of the thing we are watching becomes commoner), the center of the curve moves toward the right, and eventually, somewhere around r = 7, zero occurrences actually become unlikely. This is how the Poisson world looks graphically. All of it is intuitively obvious. Now we will back up a little and begin over, with you and your mailbox.
Suppose you typically get 4 pieces of mail per day. That becomes your expectation, but there will be a certain spread: sometimes a little more, sometimes a little less, once in a while nothing at all. Given only the average rate, for a certain period of observation (pieces of mail per day, phonecalls per hour, whatever), and assuming that the process, or mix of processes, that produce the event flow are essentially random, the Poisson Distribution will tell you how likely it is that you will get 3, or 5, or 11, or any other number, during one period of observation. That is, it predicts the degree of spread around a known average rate of occurrence. (The average or likeliest actual occurrence is the hump on each of the Poisson curves shown above). For small values of p. the Poisson Distribution can simulate the Binomial Distribution (the pattern of Heads and Tails in coin tosses), and it is much easier to compute.
The Poisson distribution applies when: (1) the event is something that can be counted in whole numbers; (2) occurrences are independent, so that one occurrence neither diminishes nor increases the chance of another; (3) the average frequency of occurrence for the time period in question is known; and (4) it is possible to count how many events have occurred, such as the number of times a firefly lights up in my garden in a given 5 seconds, some evening, but meaningless to ask how many such events have not occurred. This last point sums up the contrast with the Binomial situation, where the probability of each of two mutually exclusive events ( p and q ) is known. The Poisson Distribution, so to speak, is the Binomial Distribution Without Q. In those circumstances, and they are surprisingly common, the Poisson Distribution gives the expected frequency profile for events. It may be used in reverse, to test whether a given data set was generated by a random process. If the data fit the Poisson Expectation closely, then there is no strong reason to believe that something other than random occurrence is at work. On the other hand, if the data are lumpy, we look for what might be causing the lump.
The Poisson situation is most often invoked for rare events, and it is only with rare events that it can successfully mimic the Binomial Distribution (for larger values of p. the Normal Distribution gives a better approximation to the Binomial). But the Poisson rate may actually be any number. The real contrast is that the Poisson Distribution is asymmetrical: given a rate r = 3, the range of variation ends with zero on one side (you will never find "minus one" letter in your mailbox), but is unlimited on the other side (if the label machine gets stuck, you may find yourself some Tuesday with 4,573 copies of some magazine spilling all over your front yard - it's not likely, but you can't call it impossible). The Poisson Distribution, as a data set or as the corresponding curve, is always skewed toward the right, but it is inhibited by the Zero occurrence barrier on the left. The degree of skew diminishes as r becomes larger, and at some point the Poisson Distribution becomes, to the eye, about as symmetrical as the Normal Distribution. But much though it may come to resemble the Normal Distribution, to the eye of the person who is looking at a graph for, say, r = 35, the Poisson is really coming from a different kind of world event.
History. The Poisson Distribution is named for its discoverer. who first applied it to the deliberations of juries; in that form it did not attract wide attention. More suggestive was Poisson's application to the science of artillery. The distribution was later and independently discovered by von Bortkiewicz. Rutherford, and Gosset. It was von Bortkiewicz who called it The Law of Small Numbers, but as noted above, though it has a special usefulness at the small end of the range, a Poisson Distribution may also be computed for larger r. The fundamental trait of the Poisson is its asymmetry, and this trait it preserves at any value of r.
Derivation. The Poisson Distribution has a close connection with the Binomial, Hypergeometric, and Exponential Distributions, and can be derived as an extreme case of any of them. The Poisson can also be derived from first principles, which involve the growth constant e. That derivation is given on a separate page. for those who like to see the inner workings of the universe up close. Other readers may proceed directly to the how-to-do-it instructions in the next section.
We found, on the Derivation page, that when the average rate of occurrence of some event per module of observation is r. we can calculate the probability of any given number of actually observed occurrences, k, by substituting in the formula
It will be noticed that in our formula, the only variable quantity is the rate r. That number is the only way in which one Poisson situation differs from another, and it is the only determining variable (parameter) of the Poisson equation. Nothing else enters in.
Each number r defines a different Poisson distribution. We cannot multiply by 10 the values for the distribution whose rate is r = 1 and get the values for r = 10. The latter must be calculated separately, and will be found to have a different shape. Specifically, the larger the r for any given unit of occurrence, the more symmetrical is the resulting frequency profile. This we already noticed in the picture at the top of this page.
What we here call rate of occurrence, or r. is conventionally called lambda ( l ). Remember to make that adjustment when consulting other textbooks or tables.
Calculating Poisson probabilities ideally requires a statistical calculator, with x*y and e *x keys (remember that e is the constant 2.71828). Absent such a calculator, certain individual probabilities may be computed with the aid of the e *x Table. For selected simple values of r. problems may be solved using the Tables here provided.
Example. Let us suppose that some event, say the arrival of a weird particle from outer space at a counter on some farm outside Topeka, occurs on average 2 times per hour. But there are variations from that average. What is the probability that in a given hour three weird particles will be recorded? Substituting in formula (5) the empirical rate r = 2 and the expectation k = 3, we get:
This answer may be checked with the one given in the Poisson Table. and will be found to match. This sort of calculation was in fact how the Table was constructed.
In rough terms, then, if our weird particles average 2 per hour but vary randomly around that average, and thus fit the random Poisson model, we would expect to get 3 rather than 2 weird particles per hour, at the counter over by the silo, in about 0.1804 of the hours observed. If we only watch for one hour, our reading will most likely be 2 particles. But there are 24 hours in a day, and in an average day, there should thus be (24)(0.1804) = approximately 4 hours during which 3 particles are registered. Of course, things can vary from that most likely expectation; that is the way the universe works. But now we know what the most likely expectation is. It is such likeliest expectations that the Poisson formula gives us.
Just to show how the whole situation looks, here, from the Table, is the frequency profile for r = 2, omitting the extremely rare possibilities: