An Introduction to the Poisson Distribution - YouTube

Channel: unknown

[2]
Let's look at an introduction to the Poisson distribution,
[4]
an important discrete probability distribution.
[8]
Suppose we are counting the number of occurrences of an event
[11]
in a given unit of time or distance or area or volume.
[14]
For example, we might be counting up the number of car accidents in a day,
[17]
in a city like Toronto perhaps.
[19]
Or the number of dandelions in a square metre plot of land.
[22]
Then the number of events is going to be a random variable
[26]
that may or may not have the Poisson distribution,
[28]
depending on the specifics of the situation.
[30]
But a Poisson random variable is a count of the number of occurrences of an event.
[37]
I'm going to phrase the following in terms of time,
[40]
but the same ideas hold if we are discussing distance or area or volume etc.
[44]
Suppose events are occurring independently.
[47]
In other words, knowing when one event happens
[49]
gives absolutely no information about when another event will occur.
[52]
And the probability that an event occurs in a given length of time does not change through time.
[58]
In other words, the theoretical rate at which the events are occurring
[61]
does not change through time.
[63]
A little more loosely we might say that the events are occurring randomly and independently.
[71]
If these conditions hold, then the random variable X,
[73]
which represents the number of events in a fixed unit of time,
[77]
has the Poisson distribution.
[80]
Here's the probability mass function for the Poisson distribution,
[83]
what we'll use to calculate probabilities.
[86]
The probability the random variable X takes on the value little x,
[90]
which you'll sometimes see written as p(x),
[94]
is equal to lambda^x times e^(-lambda) over x!
[100]
e, like pi, is an important mathematical constant, the base of natural logarithms.
[107]
It is approximately 2.71828, but it is an irrational number
[114]
that has infinite non-repeating decimal places.
[117]
We've discussed factorials previously, but as a specific example of this x!,
[123]
5! would be 5 times 4 times 3 times 2 times 1,
[131]
and that would be 120.
[133]
It's not a probability distribution until we say what values X can take on.
[138]
Here the random variable is a count of the number of events in a given unit of time,
[143]
and so it can take on any non-negative whole number value.
[146]
So this is the probability mass function that we use to calculate probabilities
[151]
for any value of x that's 0,1, 2, off to infinity.
[158]
There is no upper bound on the value that X can take on.
[162]
But depending on the situation the probabilities eventually
[165]
get tiny for large values of X.
[168]
The mean of the Poisson distribution is lambda.
[171]
So mu, the mean of the random variable X, is equal to lambda.
[176]
So we could have used mu as our parameter, and some sources do that.
[179]
But we often use lambda for the Poisson distribution.
[182]
The variance of the Poisson distribution,
[185]
which we'll label as sigma squared, is also equal to lambda.
[189]
For the Poisson distribution, the mean and the variance are equal.
[194]
Let's look at an example.
[197]
Plutonium-239 is an isotope of plutonium that is used in nuclear weapons and reactors.
[203]
One nanogram, or 1 billionth of a gram, of plutonium 239
[208]
will have an average of 2.3 radioactive decays per second.
[211]
And the number of decays in a given period will follow,
[214]
to a very close approximation, a Poisson distribution.
[217]
Here we'd like to know: what is the probability that in a
[220]
randomly selected two second period there are exactly 3 radioactive decays?
[226]
We'll let the random variable X
[227]
represent the number of decays in a two second period.
[231]
Lambda is the mean number of decays in that period.
[235]
So here we have an average of 2.3 radioactive decays per second,
[240]
but we're talking about a two second period
[243]
and so in that period, the mean number of occurrences,
[246]
which is going to equal lambda, is going to be 2.3 times 2, which is 4.6.
[253]
And so X has a Poisson distribution with lambda equal to 4.6.
[259]
We want to find the probability that the random variable X takes on the value 3.
[265]
And the Poisson probability mass function
[268]
is lambda^x times e to the minus lambda, over x!
[274]
And here that's going to be
[275]
4.6, lambda, raised to the third power
[279]
times e to the -4.6 divided by 3!
[285]
If we worked that out on a calculator or computer
[288]
we'd see that's equal to 0.163, when rounded to three decimal places.
[294]
So that is the probability of getting exactly three radioactive decays in a two second period.
[302]
If we were to calculate the probabilities for the
[304]
different possible values of X and plot them, we'd get this.
[308]
This is the probability distribution of the random variable X in this spot,
[311]
a Poisson distribution with lambda equal to 4.6.
[315]
The number we calculated, the probability that X takes on the value 3, is here.
[321]
That's what we just calculated to be 0.163.
[328]
We can see here that X takes on the possible values 0, 1, 2, on up.
[333]
I've truncated the plot over here at 15,
[336]
since the possible values go off to infinity,
[339]
but the probabilities start getting very very small.
[342]
But for a Poisson distribution there is no upper bound
[345]
on the values the random variable X can take on.
[349]
For this distribution, the mean mu is equal to lambda, and that's 4.6 here.
[355]
The variance is also equal to lambda, so that's also equal to 4.6
[361]
And if we wanted the standard deviation sigma, we'd simply take the square root of 4.6.
[368]
If we look closely we can see that there's a hint of right-skewness in this distribution.
[372]
The Poisson distribution has some right skewness,
[374]
but it depends on the value of lambda.
[376]
When lambda is large, the distribution will be close to symmetric,
[380]
when lambda is close to 0, the right skewness can be pretty strong.
[385]
Suppose we wanted a different probability,
[388]
the probability there are no more than three radioactive decays.
[392]
Here that's the red bit on the plot.
[394]
We'd need to work out the probability of 0, of 1, of 2, and of 3
[400]
using the Poisson probability mass function,
[403]
and add them together. So let's go ahead and do that.
[408]
Here we need to find the probability that the random variable X
[411]
takes on a value less than or equal to 3,
[413]
which is the sum up the probabilities of 0, 1, 2, and 3.
[418]
We put these values of x
[420]
into the Poisson probability mass function with a lambda of 4.6,
[426]
and when rounded to three decimal places,
[428]
these probabilities work to these 4 values,
[431]
and they sum to 0.326.
[434]
Working out probabilities like this can be a bit of a pain if there are a lot of values,
[438]
so we often rely on software to carry out the calculations.
[444]
There is an important relationship that sometimes helps us determine
[447]
whether a random variable has a Poisson distribution.
[450]
The binomial distribution tends toward the Poisson distribution
[454]
as n tends to infinity, p tends to 0 and np stays constant.
[458]
For us, at the moment, the important bit is that the Poisson distribution
[462]
with lambda equal to np from the binomial distribution,
[466]
closely approximates the binomial distribution if n is large and p is small.
[472]
In fact, this is why the radioactive decays of plutonium has a Poisson distribution.
[477]
Even for a tiny bit a plutonium, there are a very large number of atoms,
[481]
and each one has a tiny probability of experiencing a radioactive decay in a two second period.
[486]
So in the example we just worked through,
[489]
it was in its underlying nature a binomial problem
[492]
with a very large n and a very small p.
[495]
And that's why the number of radioactive decays
[497]
is very well approximated by the Poisson distribution.
[500]
I have videos that explore this relationship in greater detail.
[507]
Like many models in probability and statistics,
[510]
the Poisson distribution is typically used as an approximation to the true underlying reality.
[515]
In most situations where we use the Poisson,
[518]
we know that the Poisson distribution doesn't fit the scenario precisely
[521]
but we use it as an approximation. Possibly a very good approximation.
[526]
But it can be difficult to determine whether
[528]
a random variable has a Poisson distribution to a reasonable approximation.
[533]
So I'm going to look at a few examples and discuss some considerations in another video.