馃攳
Alternative to SIR: Modelling coronavirus (COVID-19) with stochastic process [PART I] - YouTube
Channel: Mathemaniac
[0]
A lot of channels have talked about the SIR
model, including my own, but there is a very
[5]
different way to model coronavirus, which
boils down to how diseases really spread from
[11]
person to person. This is generally known
as branching process, because every individual
[18]
here branches out to some random number of
other individuals. Because this branching
[24]
out is random, as it can have 2 branches,
3 branches, or even not branching out at all,
[30]
we say this is a kind of stochastic process.
Stochastic being a fancier name for random.
[38]
This video will focus on the simplest type
of branching process, known as Bienayme-Galton-Watson
[43]
process, BGW for short. So what does this
BGW say about the spread of coronavirus?
[53]
Before we move on, we need to introduce a
few terminologies. All people on the same
[58]
vertical line would be in the same generation.
We call this orange column the first generation,
[65]
with the number of individuals denoted as
X_1. Similarly, we denote X_2, X_3 and so
[72]
on. Next up, we say these two individuals
from the same branch the offspring of the
[78]
individual in the previous generation that
spreads the disease to them. The reason we
[84]
use these terms is that this process was originally
studied in a different context related to
[90]
reproduction, but that鈥檚 a story for another
time.
[94]
Let鈥檚 focus on the first branching here.
We say that this is a random process, so actually,
[100]
it could stop branching right away, or branch
to 1 individual, or 2, or 3 and so on. Because
[108]
the number of branches is just X_1, the number
of offspring of this first patient, we say
[114]
that X_1 follows a distribution. The distribution
can be more conveniently written as a table.
[122]
The left column represents all the possible
values of X_1, and the right column represents
[128]
the corresponding probability. So the probability
of X_1 being 0 is p_0 and so on. Because these
[137]
are the probabilities, they have to be nonnegative,
and the sum of these probabilities would be
[143]
1.
[145]
The reason why this BGW process is the simplest
branching process is that it assumes two things:
[152]
each branching that you see here are all the
same, which means it all follows the same
[157]
distribution as the first branching. So they
are said to have the same offspring distribution.
[164]
The second simplification is that it assumes
each individual in the same generation spreads
[171]
the disease independently. Independence has
a specific meaning in probability theory,
[177]
summarised in this formula. As a very quick
detour, let鈥檚 see why we have this formula
[183]
for independence.
[185]
To see why this formula has anything to do
with independence of two events, let鈥檚 consider
[191]
this rectangular box as the entire sample
space. And the green and yellow ovals represent
[197]
the events A and B respectively. Then the
overlapping region represents A and B happening
[204]
together. For the intuitive sense of independence,
the probability of B should be the same regardless
[212]
of whether A happens. So if we just focus
on the event A, the proportion of the overlapping
[219]
region should be the same as the proportion
of the yellow bubble within the rectangular
[225]
box itself. This means the ratio of probabilities
on the left side equals the probability of
[232]
B. Just by rearranging, we get this definition
of independence. More generally, if we have
[240]
n independent events, we will have the probabilities
of them happening together to be just the
[245]
product of the individual probabilities.
[248]
Going back to the BGW model, the two questions
that we want to ask is what is the number
[255]
of individuals in the nth generation, but
because this is a random process, it can be
[261]
0, 1 and so on. So actually what we are looking
for should be the distribution of X_n, detailing
[268]
the probabilities of the different values
of X_n. The second question is what鈥檚 called
[274]
the extinction probability. This is the probability
that at some point, the entire generation
[280]
of patients just stop spreading coronavirus
entirely. But how do we get started on these
[287]
two problems?
[289]
The main issue here is that there seems to
be a lot of parameters in this offspring distribution
[294]
to characterise the entire process. These
probabilities here can all change. So we want
[300]
to turn this table to just one single thing.
We do this via something called a generating
[307]
function, which is a way to encode an infinite
amount of data into just one thing. It does
[313]
the encoding like this: the final product
will be a function of a variable, say z. This
[320]
will be a power series in z. The constant
term would be p_0, the coefficient of the
[327]
z term would be p_1, and so on. As a very
simple example to illustrate why this is useful,
[334]
suppose the probabilities are just negative
powers of 2, then using the formula for geometric
[340]
series, we can obtain the generating function
to be just a succinct formula.
[346]
This hopefully explains why generating functions
are actually very useful. Basically, if you
[352]
have all these probabilities in the distribution,
then you can encode it pretty easily to this
[358]
generating function. What鈥檚 not obvious
is the decoding, but it can be done. If you
[364]
are a bit concerned about the rigour of all
these, pause and read this.
[369]
Anyway, another useful way to see this generating
function is that it is the weighted average
[375]
of z^X_1. This is because p_0 of the time,
X_1 is 0, and so z^X_1 is just 1, and you
[386]
get the constant term, and p_1 of the time,
X_1 is 1, and so z^X_1 is just z, and you
[395]
get the z term to have the weight p_1, and
so on. This weighted average is usually called
[402]
the expected value, denoted as E of z^X_1.
All these discussions would also be valid
[411]
for a general X_n, the number of individuals
in the nth generation, just that the probabilities,
[418]
hence the generating function or the weighted
average are all different, which are all exactly
[424]
what we are after. So basically, if we are
given the offspring distribution, then we
[430]
can encode it in the generating function,
which can be written in terms of weighted
[435]
averages, then we will find some way to find
the generating function for X_n, which we
[443]
then decode to find the distribution of X_n,
which is exactly what we want. This is a much
[450]
easier approach than just directly going from
the offspring distribution.
[456]
In this BGW process, we have X_n to be the
number of individuals in the nth generation.
[463]
Because the entire generation comes from the
branchings of the previous generation, X_n
[468]
is the sum of the different copies of X_1,
because each branching is just a copy of X_1.
[474]
The number of copies is X_(n-1), the number
of people in the previous generation.
[481]
So the generating function for X_n, written
in terms of the weighted average, can be written
[486]
in terms of this, by rewriting X_n as the
sum of copies of X_1. This is pretty complicated
[494]
because X_{n-1} is not a fixed value, because
it can vary. So let鈥檚 suppose X_{n-1} is
[501]
a fixed number m, and then we will see what
we can do afterwards. Because we are calculating
[508]
weighted averages, we have to sum up terms
of this form, with the weight being the probability
[516]
that the copies of X_1 being a specific value.
Because we have assumed all the spreading
[521]
here is independent, we can apply this more
general definition of independence. So we
[528]
can change this weight into a product of probabilities
as shown here. Similarly, we can rewrite the
[535]
value here, into a product of these powers.
[539]
Rearranging, we get a product of terms of
the form of weighted values. So if we first
[545]
sum up all the values of a_1, then only these
two terms will be affected. Summing terms
[552]
of this form is precisely the weighted average
of z^(X_1), which is the generating function
[558]
for X_1. Similarly, the same generating function
appears for the other pairs. Since there are
[566]
m pairs, on the condition that X_(n-1) is
really m, the weighted average would be G(z)
[573]
to the m. However, X_(n-1) is really a variable,
this temporary result only happens sometimes.
[581]
More precisely, this G(z) to the m will happen
with exactly this probability. This is again
[590]
in the form of weighted values. And so, this
can be expressed as a weighted average of
[597]
these powers of G(z). This is just the generating
function for X_(n-1), but applied to G(z)
[605]
as an argument. In other words, G_n(z) is
G_(n-1) applied to G(z). This is true for
[614]
all n larger than 1, and so we can apply this
identity again to get G_(n-2) of G of G of
[623]
z. Repeatedly applying this we get the generating
function for X_n is just generating function
[630]
for X_1 iterated n times.
[633]
This means that we have finally cracked the
first question, because given the offspring
[638]
distribution, we can theoretically work out
the distribution of X_n by decoding that generating
[645]
function. As a sanity check that this is correct,
let鈥檚 suppose there will always be 2 branches,
[651]
which means that the offspring distribution
looks something like this, where the probability
[657]
that X_1 equals 2 is 1, and all other probabilities
are 0. This means the generating function
[666]
is z^2. By iterating it n times, the generating
function for the nth generation is z^(2^n),
[675]
meaning that the number of individuals in
the nth generation is 2^n, which is what we
[680]
expected!
[682]
However, the second question will be discussed
in another video, which will be released in
[687]
a few days. What about limitations? Or the
original context that prompted the three mathematicians
[694]
to study this process? Don鈥檛 worry, I will
cover all these in the future.
[699]
If you enjoyed this video, be sure to give
it a like and subscribe to the channel with
[703]
notifications on! See you next time!
Most Recent Videos:
You can go back to the homepage right here: Homepage





