Alternative to SIR: Modelling coronavirus (COVID-19) with stochastic process [PART I] - YouTube

Channel: Mathemaniac

[0]
A lot of channels have talked about the SIR model, including my own, but there is a very
[5]
different way to model coronavirus, which boils down to how diseases really spread from
[11]
person to person. This is generally known as branching process, because every individual
[18]
here branches out to some random number of other individuals. Because this branching
[24]
out is random, as it can have 2 branches, 3 branches, or even not branching out at all,
[30]
we say this is a kind of stochastic process. Stochastic being a fancier name for random.
[38]
This video will focus on the simplest type of branching process, known as Bienayme-Galton-Watson
[43]
process, BGW for short. So what does this BGW say about the spread of coronavirus?
[53]
Before we move on, we need to introduce a few terminologies. All people on the same
[58]
vertical line would be in the same generation. We call this orange column the first generation,
[65]
with the number of individuals denoted as X_1. Similarly, we denote X_2, X_3 and so
[72]
on. Next up, we say these two individuals from the same branch the offspring of the
[78]
individual in the previous generation that spreads the disease to them. The reason we
[84]
use these terms is that this process was originally studied in a different context related to
[90]
reproduction, but that鈥檚 a story for another time.
[94]
Let鈥檚 focus on the first branching here. We say that this is a random process, so actually,
[100]
it could stop branching right away, or branch to 1 individual, or 2, or 3 and so on. Because
[108]
the number of branches is just X_1, the number of offspring of this first patient, we say
[114]
that X_1 follows a distribution. The distribution can be more conveniently written as a table.
[122]
The left column represents all the possible values of X_1, and the right column represents
[128]
the corresponding probability. So the probability of X_1 being 0 is p_0 and so on. Because these
[137]
are the probabilities, they have to be nonnegative, and the sum of these probabilities would be
[143]
1.
[145]
The reason why this BGW process is the simplest branching process is that it assumes two things:
[152]
each branching that you see here are all the same, which means it all follows the same
[157]
distribution as the first branching. So they are said to have the same offspring distribution.
[164]
The second simplification is that it assumes each individual in the same generation spreads
[171]
the disease independently. Independence has a specific meaning in probability theory,
[177]
summarised in this formula. As a very quick detour, let鈥檚 see why we have this formula
[183]
for independence.
[185]
To see why this formula has anything to do with independence of two events, let鈥檚 consider
[191]
this rectangular box as the entire sample space. And the green and yellow ovals represent
[197]
the events A and B respectively. Then the overlapping region represents A and B happening
[204]
together. For the intuitive sense of independence, the probability of B should be the same regardless
[212]
of whether A happens. So if we just focus on the event A, the proportion of the overlapping
[219]
region should be the same as the proportion of the yellow bubble within the rectangular
[225]
box itself. This means the ratio of probabilities on the left side equals the probability of
[232]
B. Just by rearranging, we get this definition of independence. More generally, if we have
[240]
n independent events, we will have the probabilities of them happening together to be just the
[245]
product of the individual probabilities.
[248]
Going back to the BGW model, the two questions that we want to ask is what is the number
[255]
of individuals in the nth generation, but because this is a random process, it can be
[261]
0, 1 and so on. So actually what we are looking for should be the distribution of X_n, detailing
[268]
the probabilities of the different values of X_n. The second question is what鈥檚 called
[274]
the extinction probability. This is the probability that at some point, the entire generation
[280]
of patients just stop spreading coronavirus entirely. But how do we get started on these
[287]
two problems?
[289]
The main issue here is that there seems to be a lot of parameters in this offspring distribution
[294]
to characterise the entire process. These probabilities here can all change. So we want
[300]
to turn this table to just one single thing. We do this via something called a generating
[307]
function, which is a way to encode an infinite amount of data into just one thing. It does
[313]
the encoding like this: the final product will be a function of a variable, say z. This
[320]
will be a power series in z. The constant term would be p_0, the coefficient of the
[327]
z term would be p_1, and so on. As a very simple example to illustrate why this is useful,
[334]
suppose the probabilities are just negative powers of 2, then using the formula for geometric
[340]
series, we can obtain the generating function to be just a succinct formula.
[346]
This hopefully explains why generating functions are actually very useful. Basically, if you
[352]
have all these probabilities in the distribution, then you can encode it pretty easily to this
[358]
generating function. What鈥檚 not obvious is the decoding, but it can be done. If you
[364]
are a bit concerned about the rigour of all these, pause and read this.
[369]
Anyway, another useful way to see this generating function is that it is the weighted average
[375]
of z^X_1. This is because p_0 of the time, X_1 is 0, and so z^X_1 is just 1, and you
[386]
get the constant term, and p_1 of the time, X_1 is 1, and so z^X_1 is just z, and you
[395]
get the z term to have the weight p_1, and so on. This weighted average is usually called
[402]
the expected value, denoted as E of z^X_1. All these discussions would also be valid
[411]
for a general X_n, the number of individuals in the nth generation, just that the probabilities,
[418]
hence the generating function or the weighted average are all different, which are all exactly
[424]
what we are after. So basically, if we are given the offspring distribution, then we
[430]
can encode it in the generating function, which can be written in terms of weighted
[435]
averages, then we will find some way to find the generating function for X_n, which we
[443]
then decode to find the distribution of X_n, which is exactly what we want. This is a much
[450]
easier approach than just directly going from the offspring distribution.
[456]
In this BGW process, we have X_n to be the number of individuals in the nth generation.
[463]
Because the entire generation comes from the branchings of the previous generation, X_n
[468]
is the sum of the different copies of X_1, because each branching is just a copy of X_1.
[474]
The number of copies is X_(n-1), the number of people in the previous generation.
[481]
So the generating function for X_n, written in terms of the weighted average, can be written
[486]
in terms of this, by rewriting X_n as the sum of copies of X_1. This is pretty complicated
[494]
because X_{n-1} is not a fixed value, because it can vary. So let鈥檚 suppose X_{n-1} is
[501]
a fixed number m, and then we will see what we can do afterwards. Because we are calculating
[508]
weighted averages, we have to sum up terms of this form, with the weight being the probability
[516]
that the copies of X_1 being a specific value. Because we have assumed all the spreading
[521]
here is independent, we can apply this more general definition of independence. So we
[528]
can change this weight into a product of probabilities as shown here. Similarly, we can rewrite the
[535]
value here, into a product of these powers.
[539]
Rearranging, we get a product of terms of the form of weighted values. So if we first
[545]
sum up all the values of a_1, then only these two terms will be affected. Summing terms
[552]
of this form is precisely the weighted average of z^(X_1), which is the generating function
[558]
for X_1. Similarly, the same generating function appears for the other pairs. Since there are
[566]
m pairs, on the condition that X_(n-1) is really m, the weighted average would be G(z)
[573]
to the m. However, X_(n-1) is really a variable, this temporary result only happens sometimes.
[581]
More precisely, this G(z) to the m will happen with exactly this probability. This is again
[590]
in the form of weighted values. And so, this can be expressed as a weighted average of
[597]
these powers of G(z). This is just the generating function for X_(n-1), but applied to G(z)
[605]
as an argument. In other words, G_n(z) is G_(n-1) applied to G(z). This is true for
[614]
all n larger than 1, and so we can apply this identity again to get G_(n-2) of G of G of
[623]
z. Repeatedly applying this we get the generating function for X_n is just generating function
[630]
for X_1 iterated n times.
[633]
This means that we have finally cracked the first question, because given the offspring
[638]
distribution, we can theoretically work out the distribution of X_n by decoding that generating
[645]
function. As a sanity check that this is correct, let鈥檚 suppose there will always be 2 branches,
[651]
which means that the offspring distribution looks something like this, where the probability
[657]
that X_1 equals 2 is 1, and all other probabilities are 0. This means the generating function
[666]
is z^2. By iterating it n times, the generating function for the nth generation is z^(2^n),
[675]
meaning that the number of individuals in the nth generation is 2^n, which is what we
[680]
expected!
[682]
However, the second question will be discussed in another video, which will be released in
[687]
a few days. What about limitations? Or the original context that prompted the three mathematicians
[694]
to study this process? Don鈥檛 worry, I will cover all these in the future.
[699]
If you enjoyed this video, be sure to give it a like and subscribe to the channel with
[703]
notifications on! See you next time!